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I, Dr. Peter Vander Horn, being duly warned that willful false statements and the like are 
punishable by fine or imprisonment or both, under 18 U.S.C. § 1001, and may jeopardize the 
validity of the patent application or any patent issuing thereon, state and declare as follows: 

1 . All statements herein made of my own knowledge are true and statements made on 
information or belief are believed to be true. The Exhibits (1-10) attached hereto are 
incorporated herein by reference. 

2. 1 received a Ph.D. in microbiology from Cornell University in 1991. A copy of my 
curriculum vitae is attached as Exhibit 1. 

3. 1 am presently employed by MJ Bioworks, Inc. as Vice President of Research, 

Development, and Engineering. I am primarily responsible for supervising research teams 
working to improve our scientific instrumentation products, MJ Bioworks is the assignee of the 
subject patent application. 

4. 1 have read and am familiar with the contents of the application. As I understand the 
bases for the outstanding rejections, the Examiner believes that the pending claims are overly 
broad and that it would take undue experimentation to identify members of the genus of non- 
specific double-stranded nucleic acid binding domains that are either recognized by polyclonal 
antibodies generated against Sso7d or have at least 50% identity to a 50 amino acid subsequence 
of Seq. ID No: 2 or a 75% identity to Sac7d. 



Sir: 
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5. The criteria set forth in the claims was intended to provide us with claim scope that 
embraced both naturally occurring proteins in the family of non-specific DNA binding Archaeal 
proteins as well as "Archaeal 7 kDa muteins". By Archaeal 7 kDa muteins, I am referring to 
man-made recombinantly produced proteins that are derived from naturally occurring proteins. 
In this context, muteins differ from their parent proteins by the introduction of amino acid 
changes where those changes do not markedly alter its DNA binding properties compared to the 
parent protein. 

6. It is the intent of this declaration to explain in objective scientific reasons, why one of 
skill can identify working embodiments that fall within the scope of these claims with routine 
experimentation. In summary, there are three objective reasons and one subjective reason. The 
three objective reasons are: (i) that genetic variation or drift within the naturally occurring 
species of Archaeal 7 kDa proteins provides an initial road map for point mutations; (ii) that 
conventional knowledge of protein chemistry allows for us to predict that biological properties 
can be preserved so long as amino acid substitutions are conservative in their nature; and (iii) that 
knowledge of the three dimensional structure of these proteins when bound to DNA permits us to 
predict areas of non-criticality where substitutions may be freely introduced beyond mere 
conservative substitutions. As a subjective rationale, we must consider that the family of 
Archaeal 7 kDa proteins come from extremophilic bacteria that live in acidic environments above 
the melting temperature of DNA. This group of extremophiles includes many unexplored species 
that by virtue of their habitats are expected to have Archaeal 7 kDa-like DNA binding proteins. 
With so many species to be studied and so few cultured it is highly probable that additional 
members of the family will be discovered with even greater variation than those that are 
presently known and sequenced. 

7. NATURAL VARIATION . 

With regard to naturally occurring 7 kDa proteins in the family of Archaeal DNA-binding 
proteins, there are many family members reported in the literature. It is an accepted convention 
that proteins with E scores below 0.01 are unlikely to occur by chance and are therefore 
statistically related. Using Sso7d as a prototype, we studied the family of Archaeal DNA binding 
proteins reported in GenBank. We noted that there are at least 17 related members of the 7 kDa 
class of Archaeal proteins. The least related of which has an E value of 9x1 0"^. 

The evolutionary relationship between the members of this family is made quite clear 
when you conduct a BlastP search comparing Sso7d to its family members. Using the default 
parameters provided by the specification on page 16, lines 7-1 1 with the "Low Complexity" 
filter set to off to permit us to align the entire 63 amino acids, we get the following results: 
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SEQ ID: 2. 

2) Sso7d 

3) Sso7d 


ATVKFKYKGEEKBVDISKIIOCVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQMLEKQKK 
! atvkfkykgeekqvdiskikkvwrvgkraisftydegggktgrgavsekdapkellqrnmpetgkyfrhklpddypi 
meismatvkfkykgeekevdiskikkvwrvgkmisftydegggktgrgavsekdapkellqmlekqkk 
ma tvkf kykgeekqvdi ski kkvwrvgkrai s f tydegggktgrgavs ekdapke 1 1 qralakqkk 


90% 

100% 

98% 


95% 
100% 
100% 


4) Sso7d 


ma tvkf kykgeekevdi akikkvwrvgkmi sf tydegggktgrgavsekdapkeX Iqralekqkk 


. 100% 


100% 
100% 


5) Sso7d 


atvkfkykgeekqvdiskikkvwrvgkmisftydegggktgrgavsekdapkellqmlekqk 


98% 


6) Sso7d 

7) Sso7d 

8) Sso7d 

9) Ssh7B 


matvkf kykgeekqvdi skikkvwrvgkmi sf tydegggktgrgavsekdapkeX Iqndekqkk 
atvkfkykgeekevdiskikkvwrvgkmisftydegggktgrgavsekdapkellqmlekqkk 
a tvkf kykgeekqvdi ski kkvwrvgkmisf tydegggktgrgavsekdapkeX Iqmlekqkk 

mvtvkfkykgeekevdtskikkvwrvgkmisf tydegggktgrgavsekdapkeX Iqmlekqkk 


98% 
100% 
98% 
98% 


100% 
100% 
100% 
98% 


10) Sso7d mutant 


a tvkf kykgeekqvdi skikkvwrvgkmisatydegggktgrgavsekdapkel Iqmlekqk 


96% 


98% 


ll)Sso7e/Sto7c 


mvtvkfkykgeekevdiskikkvwrvgkmisftydd-ngktgrgavsekdapkellqmleksgkk 


91% 


93% 


12) Sac7a 


vkvkfkykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelldmlarae 


86% 


91% 


l3)Sac7aA)/d 


nrvkvkfkykgeekevdtskikkvwrvgkmvsftydd-ngktgrgavsekdapkellditaaraerekk 


81% 


90% 


14) Sac7e 


makvrfkykgeekevdtskikkvwrvgkmvsftydd-ngktgrgavsekdapkelmdmlaraekkfc 


79% 


88% 


15) 1SAP/Sac7 


kvkfkykgeekevdtskikkvwrvgkrovsftydd-ngktgrgavsekdapkelldmlaraerekk 


86% 


91% 


16) Sac7e 


akvrfkykgeekevdtskikkvwrvgkmvsftydd-ngktgrgavsekdapkelmdmlaraekkk 


79% 


88% 


17) Sso Diia 


tvkfkykgeekqvdiskikkvxrvgkmisftydegxgk 


92% 


94% 


binding protein 









From the above BLASTP data, we can see that the natural variation within the family 
extends to below 80% identity. At a minimum, it was the applicants' intent to encompass in a 
single claim all naturally occurring known variants of the DNA binding Archaeal protein family. 
But our knowledge of variants can be extended to include muteins by applying our knowledge of 
protein chemistry - knowledge that is both routine and predictable in its application. 

8. MUTEINS CREATED BY COMBINING NATURALLY OCCURRING 
VARIATION . 

Muteins of Archaeal 7 kDa proteins can be readily created by those of skill exploiting 
variation within the natural members of the family to create novel combinations of variations. In 
essence, the naturally occurring members are a road map to defining the critical amino acids 
from the non-critical amino acids. 
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A cursory review of the family reveals that the amino and carboxyl termini are not critical 
to the functionality of these proteins. The amino and carboxyl ends are very tolerant of 
substitutions and additions. They are sites of divergence between the homologues and the 
invention. As evidence of the robust nature of these proteins, we placed entire polymerase 
domains on both the carboxyl and the amino ends without interfering with binding. This was Dr. 
Wang's rationale for claiming sequence similarity to a 50-amino acid subsequence, rather than to 
the entire protein. Biological functionality appears to be determined by the conserved amino 
acids that form the internal core of these proteins (see Choli et al. (1988) Biochimica et 
Biophysica Acta, 950:193-203 at 202) (Exhibit 2). But even there the identity is not 100%. 

9. MUTEINS CREATED BY INTRODUCTION OF CONSERVED 
SUBSTITUTIONS . 

In addition to the introducing combinations of naturally occurring variations into a 
prototype 7 kDa binding protein, those of skill can also substitute conserved amino acids for 
naturally occurring ones that have not been found to vary in nature. Classic examples of such 
pairings are lysine and arginine, alanine and glycine, glutamine and asparagine, and aspartic acid 
and glutamic acid. All of which appear in this family of proteins. For example, there are 12 
residues of Sso7d 63 residues in which natural variations are known. By substituting conserved 
amino acids for another 20 residues, we can easily produce a non-specific 7 kDa Archaeal 
mutein that would almost certainly work to improve processivity of a polymerase. 

10. MUTEINS DERIVED FROM STUDIES OF THREE DIMENSIONAL 
ANALYSES . 

We need not limit our muteins to combinations of naturally occurring amino acid 
variations nor to those that are unnatural but between amino acids of similar chemical properties. 
This is because the three dimensional structure of these proteins when interacting with DNA is 
known. See Exhibit 3 Gao et al. 

Knowledge of three dimensional features provides yet another strategy permitting protein 
chemists to engineer away from the native sequences because it provides structural activity 
relationships between the protein domains and DNA. Knowing which domains play a role in 
DNA binding and which are non-critical for binding permits us to think beyond mere 
conservative amino acid substitution and to allow for Archaeal 7 kDa muteins with lower percent 
identities than if we confined our mutein development strategy to the first two objective 
approaches. 

Attached to this declaration as Exhibits 4-8 are enlargements of figures derived from the 
data of Gao, et al. with an accession number of IBNZ.^ Exhibit 4 is a ribbon diagram of the 



These figures are derived from the protein crystal coordinates that Gao 
submitted to the protein structure database. Submission is a requirement 
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crystal structure of Sso7d bound to DNA. The beta sheets of the protein are in yellow, the alpha 
helix is in green. Unstructured regions are in blue. 

As predicted, the unstructured regions are sites where divergences from Sso7d among the 
group of related proteins cluster. One skilled in the art could place additional insertions into these 
sites that will decrease sequence identity in blast analyses. For example, a thermostable loop can 
be placed in the G37, G38, G39 turn. 

In addition, the entire alpha helix (green) is highly mutable. This is evidenced by the fact 
that a great deal of natural variation of the homologs is observed in this domain. It should be 
noted that the naturally occurring mutations in this domain do appear to preserve the presence of 
an alpha helix and this region does not interact with the DNA substrate. Therefore, additional 
mutations could be introduced into the alpha helix (as long as they preserve the secondary 
structure) and serve to further lower the amino sequence identity compared to SEQ ID 2. 

Using the three dimensional figures, those of skill could also take note that the differences 
in composition and length between Sso7 and Sac7 proteins cluster in the turns between beta 
sheets and in amino acids facing away from the DNA binding domain in the crystal structure. 
So these domains are also areas of plasticity. 

The papers cited in the patent application describe several exposed lysine residues that are 
methylated in vivo. These sites are not involved in DNA binding but appear to be regulatory. As 
our work is independent of bacterial gene regulation, these lysines could be mutated so long as 
they do not interact with the DNA substrate. As can be seen in Exhibits 5 though 8, many of 
these lysine residues project away from the domain and do not interact with DNA. These 
residues are excellent candidates for mutagenesis. One skilled in the art would recognize that 
these could be changed to arginine residues without affecting DNA binding. 

I was able to find 10 such sites by examining the crystal structure. Exhibit 5 shows lysines 
19, 40, 49, and 53 projecting away from the DNA binding surface of the protein. Exhibit 6 also 
shows lysines 49, 61, and 64. Exhibit 7 shows lysine 63 and Exhibit 8 shows lysines 5 and 13. K 
to R derivatives already exist for positions 5 and 61, validating this approach. No divergence 
from the Sso7d sequence has been observed for the remaining 8 lysines, probably because of the 
regulatory role alluded to earlier. Mutating these lysines can yield an additional 8 differences 
from SEQ ID No. 2, or 13%. 



similar to the requirement that sequences be deposited into Genbank with an 
accession number. The accession code for Sso7d protein bound to DNA is IBNZ. 
The coordinates are viewed and turned into these figures using the program 
Cn3d, which is freely available at 

http ; //www.ncbi .nlm.nih.gov/entrez/query . fcgi?db=Structure . 
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For these varied but objective reasons, one skilled in the art could with a combination of 
conserved substitutions, insertions, deletions, and exchanges of mutable sites construct DNA 
binding proteins that are very divergent from SEQ ID: 2 and Sac7d. I will discuss specific 
percentages later in this Declaration. 

11. OTHER EXTREMOPHILES WILL HAVE ARCHAEAL 7 kDa LIKE PROTEINS . 

Beyond the objective reasons presented above, there is a subjective reason why a 
percentage below 90% is needed to avoid routine engineering around the presently issued claims. 
As of today there have been many Archaeal 7 kDa proteins that have already been reported, it 
should be noted that these proteins are very abundant in Sulfolobus species. In fact, they are 
probably abundant in any organism that has to live in acid at >70°C chemolithotrophically. Here 
are 5. Solfataricus s_ relatives many of which are expected to contain Sso7d-related proteins. 

Archaea; Crenarchaeota ; Thermoprotei ; Sulfolobales 
Sulf olobaceae 
Acidianus 

Acidianus ambivalens 

Acidianus brierleyi 

Acidianus infernus 

Acidianus tengchongenses 

Metal losphaera 

Metallosphaera prunae 

Metallosphaera sedula 

Metallosphaera sp. GIBll/OO 

Metallosphaera sp. Jl 

Metallosphaera sp. TA-2 

environmental samples 

uncultured Metallosphaera sp. 

Stygiolobus 

Stygiolobus azoricus 
environmental samples 

uncultured Stygiolobus sp. 

Sulfolobus 

Sulfolobus acidocaldarius 
Sulfolobus islandicus 
Sulfolobus metallicus 
Sulfolobus shibatae 
Sulfolobus solfataricus 
Sulfolobus thuringiensis 
Sulfolobus tokodaii 
Sulfolobus yangmingensis 
Sulfolobus sp. 
Sulfolobus sp. AMP12/99 
Sulfolobus sp. CH7/99 
Sulfolobus sp. FF5/00 
Sulfolobus sp, MV2/99 
Sulfolobus sp. MVSoil3/SC2 
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Sulf olobus 


sp. 


MVSoil6/SCl 


Sulf olobus 


sp. 


NGB23/00 


Sulf olobus 


sp. 


NGB6/00 


Sulf olobus 


sp. 


NL8/00 


Sulf olobus 


sp. 


NOB8H2 


Sulf olobus 


sp. 


RC3 


Sulf olobus 


sp. 


RC6/00 


Sulf olobus 


sp. 


RCSCl/01 


Sulf olobus 


sp. 


RT8-4 



environmental samples 

uncultured Sulf olobus sp. 

Sulfurisphaera 

Sulfurisphaera ohwakuensis 

So far only Sulfolobus solfataricus and Sulfolobus tokodaii genomes have been sequenced. 

Given the range of divergence in Archaeal 7 kDa DNA binding proteins set forth above from a 
tiny portion of species sequenced, it will be trivial to find additional species of these DNA 
binding proteins that will have 70% or less homology to the presently known prototypes. 

12. THE 90% LIMITATION OF THE PATENT INVITES THOSE OF SKILL TO 
ENGINEER AROUND THE CLAIMS WITH EASE . 

Let's look more specifically at the information that was available prior to filing the 
subject application. Dr. Wang's earlier patent US Pat. No. 6,627,424 ['424] issued with claims 
covering 90% identity to Sso7d and identity to Sac7d. Below I have created a paired table 
comparing the relative homology between Sso7d and Sac7d and Sac7d and Sac7e. 

As you can see, close relatives of Sso7d, (i.e., Sac7a,b,d and e) are not covered by the 
recited percentage in our '424 patent claims. But a pair-wise alignment of these sequences to the 
two specific examples gives one a clear road map to implementing the invention with any of the 
naturally occurring homologues. 

Sso7d alignment to Sac7d . 

Sso7d: 1 MATVKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQML---EKQKK 64 Identity Similarity 

M--VKFKYKGEEKEVD-SKIKKVWRVGKM+SFTYD+--GKTGRGAVSEKDAPKELL-ML E++KK 80% 85% 

Sac7d: 1 MVKVKFKYKGEEKEVDTSKIKKWRVGKMVSFTYDD-NGKTGRGAVSEKDAPKELLDMLARAEREKK 66 

Note: the percent identity changes to 82% and the similarity changes to 88% if Seq ID 2 is used. This is because Seq 
ID 2 is Sso7d without the MET. One skilled in the art would study the entire sequence. 

Sac7d aligned to Sac7e (not covered in the '424 patent because it is 79% identical to Seq ID 2). 
Sac7d: 1 IWKVKFKYKGEEKEVDTSKIKKVWRVGK^rVSFTYDDNGKTGRGAVSEKDAPKELLJDMLARAEREK 65 Identity Similarity 
M KV+FKYKGEEKEVDTSKIKKWRVGK1WSFTYDDNGKTGRGAVSEKDAPKEL+DMLARAE++K 92% 98% 

Sac7e: 1 MAKVRFKYKGEEKEVDTSKIKKVWRVGKMVSFTYDDNGKTGRGAVSEKDAPKELMDMLARAEKKK 65 

Note: A 49 amino acid core sequence is completely identical. 
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Finding alternative species not covered by the allowed claims of the '424 patent, w^hether 
the above recited naturally occurring species or man-made muteins are trivial exercises for one 
skilled in the art. No reasonable protein chemist looking at this data would doubt that Sac7e 
could increase the processivity of polymerases if traded out for Sac7d in the constructs in Seq ED 
No. 9 and SEQ No. ID 10 of the '424 patent. 

It is also helpful to take note that three of the references Dr. Wang cited in the patent 
(Choli et. al. Exhibit 2, Baumann et. al. Exhibit 9, and McAfee et. al. Exhibit 10) contain 
figures with sequence alignments of Sso7d homologues including Sac7d, Sac7a, and Sac7e. 
They are repeatedly described as structurally and functionally closely related proteins. The Sac7d 
construct (figure 2 of the application) was made to support that contention that these homologues 
would work. Dr. Wang clearly knew about and taught these proteins would work in the 
invention. No one skilled in the art that reads the patent specification and the referenced papers 
would have objective reasons to think it wouldn't work. 

For these reasons, I submit that a 79% identity to Sso7d using naturally occurring 
variants is clearly enabled by the specification. 

13. ROUTINELY INTRODUCING NON-NATURAL VARIATIONS LOWERS THE 
PERCENTAGE BELOW 79% . 

Using natural variants as a road map a 79% identity is readily available. But man-made 
modifications can take this 79% identity lower. One can go lower in percent identity by merely 
combining known deviations from Sso7d. Using the family of Sac7 proteins as a road map one 
obtains the following hybrid sequence: 

Hypothetical Id: mvkvkvrfkykgeekqvdtskikxvgrvgkmvsatyddngktgrgavsekdapkelldmlaraerekk^ 
The hypothetical protein 7d is 76% identical to Sso7d as shown in the alignment below. 

Sso7d: ATVKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQML EKQKK 64 

- - V+ FKY KGEEK+ VD - SKI KKV - RVGKM+S FTYD+ - - GKTGRGAVSEKDAPKELL- ML E+ + KK 

Hypothet: VKVRFKYKGEEKQVDTSKIKKVGRVGKMVSFTYDD-NGKTGRGAVSEKDAPKELIiDMlARAEREKK 65 



14. COMBINING ALL THE INFORMATION WILL LEAD ONE OF SKILL TO 
MUTEINS HAVING LESS THAN 60% SEOUENCE IDENTITY TO SAC7d . 

Combining all of these changes together one can get a functional derivative of SEQ ID No. 
2 with less than 60% amino acid identity in a blast search. One example of such a protein 
sequence is below. 



One known Sso7d divergence was not included in this alignment. The F34A 
mutation was not included because it is known to destabilize the protein. All 
other divergences are from functional proteins. 
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VKVRVRFKYK GEERQVDTSR IRKVGRVGKM VSATYDDACA AACNGRTGRG AVSERDAPRE LLDMLARAER 
ERR 

We have identified other muteins of Sso7d that enhance polymerase performance. For 
instance: I^^ to V, T"^^ to N, D^^ to Y, and to K in the above sequence. With one exception 
(I^^ to V), these are not conserved changes; but, they are changes that do not affect core 
structures. 

When all this information is combined, it would be straightforward to identify muteins with 
less than 60% identity to Sso7d that would still enhance polymerase performance. 

15. THE ARCHAEAL 7 kDa PROTEINS ARE AN ANCIENT PROTEIN AND 
EXISTING EVOLUTIONARY DRIFT ESTABLISHES THE HIGH PROBABILITY THAT 
MUTEINS WITH 50% IDENTITY TO ANY KNOWN SPECIES CAN BE CREATED . 

From an evolutionary perspective, this family of thermal stable DNA binding proteins is 
apparently quite ancient. There is a restriction endonuclease from Methanococcus jannashii 
(Results below)~another archaeon- that a blast search of the Swissprot Database with Seq ID 
No. 2 will identify. The 47% identity of this DNA binding protein to Sso7d indicates that the 
DNA binding domain has been around for a long time and that with routine sequencing of 
genomes from the Archaeal family there will be many easily obtainable proteins with even less 
than 50% identity to Seq ID No. 2 that will work in the invention. 

> p\ 1 0954528lrcnNP 044 167.11 M. jannaschii predicted coding region MJECL41 [Methanococcus jannaschii] 

gii { 2229988I$dIO60296|T 1 SUM L-TJA Putative type 1 restriction enzyme MjaXP specificity protein (S 
protein) (S.MjaXP) 

gil2l290S4!piriiH64S14 hypothetical protein MJECWl - Methanococcus jannaschii plasmid 
pURBSOO 

pin 522674it;b!AAC37 110. I I M. jannaschii predicted coding region MJECL41 [Methanococcus 
jannaschii] 
Length = 432 

Score = 30.0 bits (66), Expect = 8.5 

Identities = 19/45 (42%), Positives = 24/45 (53%), Gaps = 1/45 (2%) 

Query: 3 VKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSE 47 

VKF+ + + . E . KE . DI . KI . K . W . V . K . I + — GG . T + . E 

Sbjct: 5 VKFRWETEFKETDIGKIPKDWDV-KKIKDIGEVAGGSTPSTKIKE 48 

Having provided multiple objective roadmaps to the creation of muteins, it needs to be 
said that actual function is always subject to empirical determination. To determine if the 7 kDa 
Archaeal muteins function as desired, the Examiner is asked to take note of the generic assay for 
DNA binding described on pages 18-19 of the specification. Here, the inventors present a 
generic method for readily and conveniently testing for operable species. 
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Based on the objective reasons set forth above, I submit that the creation of Archaeal 7 
kDa muteins having 60 to 50% identity to native Archaeal 7 kDa is a matter of routine 
experimentation, 

16. DEFINING THE PROTEINS BY THEIR ABILITY TO BIND TO ANTIBODIES 
GENERATED AGAINST A PROTOTYPE LIMITS THE PRIMARY AMINO ACID TO 
DEFINED STRUCTURE. 

In addition to defining the invention by a percent identity, an alternative scope of claim 
protection was presented where the DNA binding proteins were defined as those recognized by 
polyclonal antibodies generated against specific Archaeal 7 kDa DNA binding proteins. The 
Examiner has rejected claims directed to non-specific double-stranded nucleic acid binding 
domains that are recognized by polyclonal antibodies generated against Sso7d. As I understand 
the rejection, the Examiner believes that the scope of this claim encompasses too many non- 
operable species to be considered allowable. 

In the first instance, I would like to point out that the scope of proteins encompassed by 
the language is more limited than the claims where the proteins have 50% identity. 

The use of immuno-crossreactivity to define proteins as related or unrelated is an old and 
well-recognized art. The specification, at pages 16-18 provides a routine and conventional 
means to compare unknown proteins with known proteins. 

In addition, it is well-known in the art to use antisera as identification reagents to clone 
genes, based on the expression of a protein mediated by an expression vector. If the library 
source is one of the naturally-occurring relatives of Sulfolobus sulfataricus listed above, the 
probability that any cross-reacting gene obtained from the library would function to increase the 
processivity of polymerases is very high. 

But naturally occurring proteins are not the only proteins that would be expected to cross 
react with polyclonal antisera generated against the prototype Archaeal 7 kDa proteins. One 
could easily envision muteins that would retain immuno-crossreactivity. To the extent that some 
may lack function; those inoperable embodiments could be rapidly distinguished from operable 
species using the prescribed assay set forth in the specification. 

When these teachings are coupled with the generic assay for testing functionality of the 
proteins to non-specifically bind to DNA (see the specification at pages 18 and 19), 1 submit that 
there is no objective reason to doubt that the identification of many operable species with 50% or 
greater sequence identity with SSo7d or Sac7d with polyclonal antibodies specific to the two 
prototypes would be anything other than routine and expected. 
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This Declarant has nothing further to say. 



attachments: Exhibit 1-10 
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Isolation, characterization and microsequence analysis of a small basic 
methylated DNA-binding protein from the Archaebacterium, 
Sulfolobus solfataricus 

Theodora Choli, Petra Henning, B. Wittmann-Liebold and Richard Reinhardt 

Abtelhmg Wiltmann. Max-Planck-Inslilut Jur MoUkulare Genetik, Berlin (Germany) 
(Received 18 December 1987) 

Key words: DNA binding protein; Radius of gyration; Amino add methyUtion; Microsequence analysis; 

(S. soljatariem) 

DNA-binding proteins have been extracted from the Oiermoaddophilic archaebacterium Sulfolobus soffa- 
taricus strain PI, grown at WC and pH 4.5. These proteins, which may have a histone-like fimction, were 
isolated and purified under standard, non-denaturing conditions, and can be grouped into three molecular 
mass dasses of 7, 8 and 10 kDa. We have purified to homogenity the mahi 7 kDa protein and detemined its 
DNA-binding affinity by filter binding assays and electron microscopy. The Stokes radius of gyration 
indicates that the protein occurs as a monomer. The complete amino-add sequence of this protein contauis 
14 lysine residues out of 63 amino adds and the calculated M, is 7149. Five of the lysine residues are 
partially monomethylated to varying extents and the methylated residues are located exdusivdy m the 
N-tcrminal (positions 4 and 6) and the C-terminal (positions 60, 62 and 63) regions only. The protem is 
strongly homologous to the 7 kDa proteins of Sulfolobus addocaUarius with the highest homology to protein 
7d. Accordingly, the name of this proteb from S. solfataricus was assigned as DNA-binding protein Sso7d. 



I 



Introduction 

The mode of packing for eukaryotic DNA is 
well established. A set of small basic proteins, the 
histones, are involved in the formation of compact 
DNA-protein particles which contain the double- 
helical DNA coiled around an octameric histone 
complex (!]. In bacteria, the mechanism for fold- 



Abbreviations: TPCK. N.tosylamido-2-phcnyIelhylchIoro- 
meihyl ketone; DABITC, N '-dimethylaminoazobcnzenc- 
4'-ijoihiocyanaie; SSC. 0.15 M irisodium cilrate/0.015 M 
NaO (pH 7.0); PMSF, phenylmethyUuIphonyl fluoride; BSA. 
bovine serum albumin; PTH, phenyllhiohydantoin. 

Correspondence: T. Choli. M ax-Planck- Ins li tut Hir Molekulare 
Genciik, Abieilung Wiltmann, Ihnesu. 73, D-1000 Berlin 33 
(Dahlcm), Gennany. 



ing the long circular DNA molecule into a com- 
pact form is much less clear. Although a number 
of proteins have been implicated for this fimction 
[2], a precise description of the composition of 
* bacterial chromatin' is not yet available. 

Although the structure and composition of the 
bacterial nucleoids are not very weD defined, there 
is compelling evidence that bacterial DNA is 
folded into a compact complex [3,4] through the 
participation of at least three proteins [5]. In re- 
cent years, several histone-like DNA-binding pro- 
teins have been isolated from eubacteria, called 
NSl and NS2, HU, HD or DNA-binding protein 
11. Their amino-add sequences have been de- 
termined and are currently under further investi- 
gation (6-10]. Significant homologies have been 
found between the eubacterial proteins and the 
first protein isolated from the archaebacterium 
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EXHIBIT 2 




m 

Thermoplasma acidopMum (for reference see Ref. 
8). Previously, at least two groups of DNA-bind- 
ing proteins with estimated molecular masses of 9 
kDa and 6 kDa were found in several Sulfolobus 
species [ll). From our results it has become clear 
that Suifoiobus acidocaldarius contains several 
DNA-binding proteins of similar sizes with 
values of 7000, 8000 and 10000 [12,131, of which 
the predominant protein, 7d [14], and three of the 
minor components (proteins 7a, 7b and 7e) have 
been sequenced recently [15]. 

In this paper we present the isolation, char- 
acterization and primary structure determination 
of the predominant 7 kDa protein from Sulfolobus 
solfataricus strain PI and compare its sequence 
with that of the other known bacterial DNA-bind- 
ing proteins. Our nomenclature for these proteins 
in the 7 kDa class is based on the increased 
basicity of the proteins in the order 7a to 7e due 
to thdr charge differences [12]. To avoid confu- 
sion, it should be pointed out that the primary 
structure of the dominant 7 kDa protein from S. 
acidocaldarius DSM 1616 has been determined 
(14), but at those times the organism was named 
Stdfolobus solfataricus DSM 1616. Comparison of 
DNA-binding proteins, characterization of ribo- 
somal proteins by two-dimensional gel elec- 
trophoresis and the immunological characteriza- 
tion of RNA-polymerase subunits had demon- 
strated clearly that the strain DSM 1616 is similar 
although not identical to S. acidocaldarius DSM 
639 and different from other S, solfataricus strains 
[13]. Therefore, this strain was renamed S. 
acidocaldarius DSM 1616. 

Experimental procedures 

Materials 

Sodium dodecylsulfate (SDS) was obtained 
from Serva (Heidelberg, F.R.G.). TPCK trypsin 
was obtained from Worthington (Freehold, NJ, 
U.S.A.). DABITC was from Fluka (Buchs, 
Switzerland), and recrystallized from boiling 
acetone. Ovalbumin, chymotrypsinogen A, 
myoglobin, cytochrome c and bovine trypsin 
inhibitor were from Serva (Heidelberg, F.R.G.). 
The scintillation cocktail was Beckman Ready-Solv 
TM'^^ Beckman (Berkeley, CA, U.S.A.). All solu- 



tions used for protein purification contained 0.1 
mM PMSF, 0.1 mM benzamidine hydrodiloride 
and 6 mM 2-mercaptoethanol, AT'-monomethyl- 
lysine and the other methylated lysine derivatives 
were purchased from Serva and CalBiodiem 
(Frankfurt, F.R.G.). Acetonitrile and 2-propanol 
for HPLC solutions were of LiChrosolv grade and 
all other chemicals were of pro analysis grade 
purchased from Merck (Darmstadt, F.R.G.). 

Methods 

S. solfataricus strain PI was obtained from W. 
Zilhg (Munich), and cells were grown at 86**C 
under conditions described in Ref. 12, with the 
addition of 1 g per liter casamino adds (Difco, 
Detroit, MI, U.S. A.) to the medium. 

Purification of the DNA -binding protein. S. 
solfataricus cells were suspended in Polymix-Hepes 
buffer [16]. After addition of DNAase I (RNAase 
free), the cells were broken twice in a Gaulin- 
Manton press (General Electric, Fort Wayne, IN, 
U.S.A.) at 72 MPa (9000 Ib/inctf). CeUular de- 
bris was removed by centrifugation (1.5 h at 
10000 X g) and the salt concentration of the su- 
pernatant was raised to 1 M NH4CI. Ribosomes 
were separated from smaller protdns by centrifu- 
gation overnight at 160000 Xg. The supmiatant 
was dialysed against 10 mM phosphate buffer at 
pH 6.0 and appUed onto a CM-Sepharose CL-6B 
column (5 X 40 cm). Proteins were eluted with a 
linear NaCl gradient from 0.05 to 0.8 M in 10 mM 
phosphiate buffer at pH 6.0 (20 1, flow rate 100 
ml/h), 30 ml fractions were collected and assayed 
for protein content by SDS-polyacrylamide gd 
electrophoresis (SDS-PAGE). Further purification 
was obtained by gel fdtration on Sephadex G-50 
superfine in 0.35 M NaCl and additionally by 
ion-exchange chromatography on Fractogel TSK 
CM-650 (S) with a linear NaCl gradient from 0.1 
to 0.5 M. 

Proteins were checked for purity and identified 
by slab gel electrophoresis in the presence of SDS. 

Determination of Stokes radii. Stokes radii of 
gyration, R^, were determined by analytical gel 
filtration on a Sephadex G-50 superfine column 
(1.7 X 190 cm) in 0.35 M Naa/20 mM phosphate 
buffer (pH 7.0). The flow rate was 12 ml/h and 
the absorption at 230 nm was recorded continur 
ously. The distribution coefficient, fco, was calcu- 



- v 


! 

! 

i 




latec 1 


• 7'* 


Dcx ! 


•i* VpT • 
1^ 


«k: i 




and : 




for I 




invc i 




desc i 




cafil i 




i 

oval j 




myc 




bovi 








desc 




Ref. 




incT' 




0.1: 




15 1 




C0ll( 




MA 




22** 


m. 


mer* 




wen 




0.1: 


^} 
■■ 


qua] 




'man 




exai 




ring 




prot 




6 




expc 

Mr 




on a 


■ 


were 




amo 




incu 




1161. 




colu 




was 




peal 




E 




DR 




sam- 




mici 




able 




ble-j 




straj 




mM 


i 


^ Mg( 
•cplcx 


■ ti^i 


^radsc 



195 



ic- 

at 
sa- 
les 
fu- 
int 

at 
6B 
I a 
iM 
.00 
red 

gd 
.on 
•50 
by 
SK 
0.1 

ted 

)S. 

of 

gel 
am 

ate 

\nd 
au 



M 



1( 



% 

n 

( 

i. « 



laied from the void volume {{Vq) determined with 
Dextran blue (2000)), the total available volume 
/(K ) determined with benzamidine hydrochloride), 
and the elution volume (KJ. The calibration Ime 
for Stokes radii was obtained by plotting the 
inverse error function of (1 - fco) against as 
described by Ackers [17]. The colunm was 
calibrated using the following proteins as markers: 
ovalbumin (3.0 nm), chymotrypsinogen A (2.2 nm), 
myoglobin (1.9 nm), cytochrome c (1.61 lun) and 
bovine trypsin inhibitor (1.45 nm). 

Filter binding assays. The filter binding assay 
described in Ref. 18 was modified according to 
Ref. 13. A fbced amount of ^H-labeled DNA and 
increasing amounts of protein were incubated in 
0.1 X SSC buffer, but containing 0.25 M NaQ, for 
15 min at 37 °C. DNA-protein complexes were 
collected onto Millipore filters (0.45 Milford, 
lAAy U.S.A.) which were presoakcd for 1 h at 
IVC in 10 mM KCl/1 mM EDTA/5 mM 2- 
mercaptoethanol/50 VL%/tsA BSA. The complexes 
were washed three times with 3 ml portions of 
0.1 X SSC buffer containing 0.25 M NaCl and 
quantified by liquid scintiUation counting (Beck- 
man LS 7000). The DNA-binding affinity of the 
examined proteins was expressed in percent refer- 
ring to the 100% sample of [^HJDNA without 
protein content. 

Gel-filtration binding experiments. DNA binding 
experiments using size exclusion chromatography 
on a Sephadex G-50 superfine column (2 X 50 cm) 
were carried out as described in Ref. 14, A fixed 
amount of Sulfoiobus DNA and protein 7d was 
incubated for 15 min at 67** C in 'polymix* buffer 
[16]. 1 ml of the sample was injected into the 
column and comigration of the protein with DNA 
was established by analysis of the void volume 
peak by SDS gels. 

Electron microscopy studies. The formation of 
DNA-protein complexes and the preparation of 
samples for electron microscopy by adsorption to 
^lica was performed as described in Ref. 19. Vari- 
able amounts of protein were incubated with dou- 
ble-stranded plasmid RSF 1010 and single- 
stranded *X 174 DNA in a buffer comprising 10 
mM triethanolamine-HCl/50 mM Ka/2.5 mM 
Mga2/2.5 mM lAdithiothreitol (pH 7.5). Com- 
plexes were fixed with 0.2% (v/v) glutaraldehyde, 
adsorbed to mica and stained with 2% (w/v) 



aqueous uranyl acetate. Rotary shadowing was 

done with platinum-indium (80 : 20) at an angle of 
about 8"*. Electron micrographs were made with a 
Philips electron microscope, model EM 480. 

Enzymatic digestion with trypsin. The protein 
was digested with TPCK-trypsin (enzyme-to-sub- 
strate ratio, 1 : 50) in 100 mM JV-methylmorpho- 
Une acetate buffer at pH 8.1 for 2 h at 37°C, with 
gentle stirring. The peptides were separated by 
reversed-phase HPLC (RP-HPLC) on a Vydac Cig 
(201 TPB) column (250 X 4 mm) in dilute aqueous 
trifluoroacetic acid using an acetonitrile gradient 

Cleavage with CNBr, Protein 7d (1 mg) was 
cleaved with 6 mg CNBr in 70% (v/v) formic acid 
for 48 h in the dark under nitrogen at ambient 
tCTiperature. The peptides obtained were sep- 
arated directly by RP-HPLC on a Vydac C4 (214 
TP54) column (250 x 4 nam) with a gradient of 
2-propanol in aqueous 0.1% trifltioroacetic add, 
or with a Vydac Cjg (201 TPB) colunm (250 X 4 
nun) with an acetonitrile gradient in aqueous tri- 
fluoroacetic acid. 

Sequence determination. Automatic sequencing 
of the intact protein was done in a liquid phase 
sequencer [20] with on-hne detection of the PTH- 
amino acids [21] by isocratic HPLC employing a 
2-propanol HPLC solvent system [22] or in a 
pulsed gas-liquid phase sequencer [23] (AppUed 
Biosystems, model 477 A) with on-line detection of 
the PTH-amino acids by HPLC using a gradient 
system (Applied Biosystems PTH-analyzer, model 
120A), Sequence analysis of tryptic peptides was 
performed by manual microsequencing employing 
the DABITC/PITC double coupling method, and 
the amino-acid derivatives were identified by 
two-dimensional thin-layer chromatography 
[24,25]. DABTH-Leu and DABTH-Ile, which 
comigrate on the micro-TLC plates were identified 
by isocratic HPLC [26]. The peptides obtained 
from cyanogen bromide cleavage which carried 
homoserine residues were sequenced in a solid 
phase sequencer employing the homoserine lac- 
tone attachment procedure [27,28]. 

Amino-acid analysis. Hydrolysis of the protein 
and peptides was performed in 100 pX 5.7 M HCl 
for 24 h at 110**C. The amino acids were de- 
termined after precolunm derivatization with o- 
phthaldialdehyde by RP-HPLC separation as de- 
scribed in Ref. 29. 
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Results and Discussion 

The growth of S. solfataricus strain PI, brea- 
kage of cells and isolation of the DNA-binding 
proteins were performed as described in the Ex- 
perimental procedures. Similar to S. acidocaldarius 
ceDs [12], three molecular weight dasses of DNA- 
binding proteins of 7, 8 iand 10 kDa have been 
isolated from 5. solfataricus strain PI. The major 
component of the 7 kDa class is the DNA-binding 
protein 7d, according to the nomenclature used 
for the DNA-binding proteins from S. acidocal- 
dariiis [13], 

Fig. la shows the protein separation on CM- 
Sepharose CL-6B. Hie fractions containing pro- 
tein 7d and an 8 kDa protein are marked. Further 



purification of protein 7d was performed by gd- 
filtration on Sephadex G-50 and by ion-exdiange 
chromatography on CM-Fractogel TSK as 
described in Experimental procedures (chromato- 
grams not shown). Fig. lb shows the purified 
protein 7d from 5. solfataricus PI on SDS-PAGE 
in comparison to 7 kDa DNA-binding protdns 
from S. acidocaldarius. 

Stokes radii of gyration 

The degree of asymmetry and oligomerisation 
of proteins are easily determined by analytical gel 
filtration [171. This procedure allows the use of 
low protein concentration in order to avoid 
artefacts such as protein aggregation. The relation 
between the Stokes radius, R^, and the quaternary 




Fig. 1- (a) Separation of the DNA-binding proteins on CM- 
Sepharose CL-6B. Pooled fractions for protein 7d and an 8 
kpa protein arc marked. The NaQ concentration was in-, 
creased from 0.40 M to 0.49 M in phosphaie buffer (pH 6.0) 
within the marked region, (b) Protein 7d derived from S. 
solfataricus (this paper) in comparison to 7 kDa proteins from 
5. acidocaldarius (this paper and Ref. 15). Lanes 1 and 6 show 
TP 50 marker proteins from S. solfaiaricusi lane 2, protein 7b 
from S. acidocaldarius; lane 3, protein 7c from S, 
acidocaldarius; lane 4, protein 7d from S. solfataricus; lane 5, 
protein 7d from 5. acidocaldarius. 





XABLEI 

-HE STOKES RADII OF GYRATION OF THE 7 kDa 
nxjA-BINDING PROTEINS FROM S. ACIDO- 
rALDARlVS "AND X SOLFATARICVS " DETERMINED 
BY ANALYTICAL GEL FILTRATION (17]. 
Tie fiictional ratio (f/h) calculated from the rauo of /?. 
god the nufitts of the eqttivalent sphere 





IJ, (nm) 












monomer 


dimer 


tetramer 


(a) Sac7d 

(b) Sso7d 


1.53 
1.56 


1.20 
1.21 


0.95 
0.96 


0.75 
0.73 



Structure of proteins is the frictional ratio, ///o, 
which can be calculated from the experimental i?, 
value and the theoretical minimal radius, 
for a given molecular weight. Table 1 shows that 
in 0.35 M NaCl the 7 kDa proteins are monomers 
like the 7 kDa protdns from S. acidocaldarius. 
This is also in accordance with results from H- 
NMR experiments (data not shown). 

Filter binding assays 

The original procedure [18] for filter binding 
assays used rather low ionic strength buffer (0.1 x 
SSQ, which aUows the nonspecific binding of 
basic proteins to nucleic acids by electrostatic 
interactions. In order to avoid this, the NaQ 
concentration of the binding buffer was increased 
to 0.25 M in 0.1 X SSC. It has been shown that at 
this ionic strength, basic proteins like lysozyme, 
cytochrome c or E. coli ribosomal proteins do not 
bind to DNA due to their basicity only (13]. Well 
established DNA-binding proteins like HU from 
£ coli and DNA-binding protein II from Bacillus 
stearothermophilus showed with these buffer con- 
ditions a binding capacity of 18% to 20% at a 
protdn/DNA ratio of 25. The whole set of 
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DNA-binding proteins from S. acidocaldarius 
clearly demonstrated binding capacities in the 
range of 5% to nearly 80% under the same condi- 
tions [12-141. The filter binding assay of protein 
7d (Table II) resulted in a DNA-binding affinity 
of about 18% binding capacity referring to the 
100% sample of [^H]DNA without protein content 
at a protein/DNA ratio of 25. This value is slightly 
higher than that of the homologous protein from 
S. acidocaldarius, which can be explained by the 
different amount of methylated lysines. 

The resulu of the size exclusion experiments 
confirm qualitatively those from filter binding as- 
says. If the protein/DNA ratio is increased drasti- 
cally, free protein is fractionated by the Sephadex- 
G50 superfine column after the void volume peak, 
which contained the protein/DNA complex. The 
same results were obtained using either Sulfolobus 
or £. coli DNA. In the latter case, incubation 
temperature was decreased to yT^C, 

Electron microscopy 

Fig. 2 presents the electron micrographs of 
protein 7d in complexed formation with both dou- 
ble- and single-stranded DNA. The formation of 
the protein-DNA complex results in highly con- 
densed DNA-protein clusters. With increased pro- 
tein/DNA ratios, the isolated clusters on the DNA 
merge more and more into a large central pro- 
tem/DNA cluster, surrounded by loops of free 
DNA. A preference for single- or double-stranded 
DNA was not found. Similar structures have been 
observed for the 7 kDa proteins from S. acidocal- 
darius, which represent a very homogeneous group 
of five DNA-binding proteins [14,15]. All these 
highly similar proteins have been shown to inter- 
act specifically with single- and double-stranded 
DNA, although a sequence specificity has not 
been observed [19]. 




TABLE II 

MILLIPORE FILTER BINDING ASSAYS 

Incitasing amounts of protein were incubated with 0.5 ,ig ^H-labeled DNA in the presence of 0.25 M NaQ ^.ixSSC T^^ 
DNA-binding affinity of protein 7d from S, solfataricus is shown. 100% afHrnty is equivalent to Uie total amount of ( HJDNA. 



Protdn/DNA ratio (w/w) 
DNA-bin<ting affmity (%) 



10 
10 



15 
13 



20 
16 



25 
18 
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Fig. 2, Dectron micrographs of nucleoproteins formed with 
protdn 7d. Some complexes formed with (ss) DNA (0X 174) 
are marked with arrows. Ousters of bound protein on (ds) 
plasmid DNA RSF 1010. surrounded by free DNA, could be 
observed. 



Amino-acid sequence determination 

The complete aminoadd sequence of proteb 
7d from the archaebacterium S. solfataricus and 
the strategy employed for the sequence determina* 
tion are shown in Fig. 3. The amino-add composi- 
tion derived from the sequence is in good agree* 
ment with that obtained from the total hydrolysis 
of the protein (Table III). As derived from the 
amino-acid sequence, protein 7d contains mod- 
ified lysines which were identified as monomethyl- 
ated residues partiaUy modified at positions 4, 6, 
60, 62 and 63 and fully methylated at position 62 
(see below). 

Occurrence of modified amino acids in the protein 

In the PTH-amino add identification system of 
the Uquid [21,22] and gas-liquid phase sequenator 
[23], a new peak was observed in steps 4, 6, 60, 62 
and 63. This modified derivative was identified 
on-line as e-monomethyl-PTH lysine in compari- 
son with an authentic reference. 
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IS 



20 



SEQ 
TRY 

CB 



Ala-Thr-Val-Lys*-Phe-Lys*4yr-Lys-Gly-6lu-6lu-Lys-6tu-VQl*Asp-lle-SeK^ 



7bi 



T2 



T3 



T5 



SEQ 

TRY — 



CB 



SEQ 
TRY 

CB 



25 30 35 40 

Vol -Trp-Arg-Val-Gly-Lys-Mel-lle -Ser-Phe -Thr-Tyr- Asp-Glu-Gly -Gly -Gly -Lys -Thr-Gly -Arg - 
_ T6 ~* T? - ^ — — 



TSa J7q 
^ ^ CB2 



45 50 55 60 ^ 

Gly-Ala-Vbl-Ser-Glulys-Asp-AlQ-Pro-Lys-Glu-Leu-Leu-Gln-Met-Leu-Glu-Lys^^ 
Tb T9 , J10 . , J11 

CB3 



Fig, 3. Amino-acid sequence of DNA-binding protein 7d from S. solfataricus. Sequences of individual peptides and intact protein arc 
indicated as follows: Sequenced automatically using a pulsed gas-liquid phase sequencer (23), or a liquid-phase sequencer 
(20-22]. — . Manual liquid-phase DABITC/PITC double coupling method [24,25]. t>, Solid-phase sequencing after homoserinc-Iac- 
tonc attachment to aminopropyl glass (APG) [27,28]. TRY and CB indicate peptides derived from digestion with trypsin or cleavage 

with GNBr. Lys* indicates the ^'-monomethylated lysines. 
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Furthennore, experiments with lysine deriva- 
tives showed that this unusual amino acid comig- 
rates with the authentic o-phthaldialdehyde de- 
rivative of £-monomethyl lysine in the amino-acid 
analyzer [15]. Fig. 4 shows the HPLC separation 





TABLE III 

AMINO-ACID ANALYSIS OF THE DNA-BINDING PRO- 
TEIN 7d FROM 5^. SOLFATARJCVS 

n.d., residues not determined by amino-add analysis. 

Number of residues derived by amino-add: 



sequence 



analysis * 



Asp 


3 


2.6 


Asn 






Glu 


7 


9.0 


Gin 


2 




Ser 


3 


2.4 


Gly 


7 


7.6 


Thr 


3 


2.4 


Arg 


2 


2,3 


Ala 


3 


3.0 


Tyr 


2 


1.7 


Trp 


1 


n.d. 


Met 


2 


1.2 


Val 


5 


5.6 


Phe 


2 


1.6 


lie 


3 


2.9 


Leu 


3 


3.1 


Lys** 


14 


12.6 


Pro 


1 


n.d. 



* The values given are not corrected for destruction of amino 

adds or incomplete hydrolysis. 
^ Lys refers to the sum of lysine and monomelhylated lysine. 

Due to the presence of incompletdy modified lysines, the 

value for lysines by amino-add analysis cannot be calculated 

predsely. 



of a standard amino-add mixture plus e-mono- 
methylated lysine after o-phthaldialdehyde deriva- 
tization. The additional peak which migrates be- 
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Fig. 4. (a) Separation of 100 pmol of a reference amino-ackl 
mixture containing A^'-monomethylated lysine, after ortho- 
phlhaldialdebyde precolunm derivatization, by reversed-phasc 
HPLC, using a column (250x4 mm) filled with Shandon 
Hypcrsil ODS 5pi material Buffer A was 12.5 mM Na2HP04 
(pH 7.2), and buffer B was 3% tetrahydrofuran in methanol 
(27]. The peak which appears between threonine and arginine 
comigrates with authentic e-monomethylated lysine (K*). (b) 
The amino-add composition of protdn Sso7d after total hy- 
drolysis. The separation of the amino adds was as described in 
Fig. 4a. The characteristic peak for N*-monomethyl lysine 
(K*) appears at the same position in the chromatogram. (c) 
The amino-acid composition of the C-tcrminal peptide (CB 3) 
after add hydrolysis. The separation of the amino adds was as 
described in Fig. 4a. The peak marked with an asterisk shows 
the e-monomethylated lysine residue. 



200 



tween threonine and arginine derivatives was de- 
termined to be e-monomethyllysine, whereas e-di- 
methyllysine migrated after the arginine derivative 
and c-trimethyllysine before glycine. 

Fig. 4b shows the separation of the amino-add 
derivatives of protein 7d produced after amino- 
acid hydrolysis. Between the arginine and 
threonine o-phthaldialdehyde derivatives, the £* 
monomethyllysine of the hydrolysate of the 
DNA-binding protein 7d can be identified. 

Separation of tryptic peptides and tf -terminal se- 
quence region 

Fig. 5 demonstrates the separation of the tryptic 
pq)tides by RP-HPLC with a VydaC C^g column. 
Some peptides with the same amino-acid composi- 
tion except for the lysine content elute at different 
retention times. This effect is probably caused by 
the different degree of methylation of lysine re- 
sidues. Sequence information and o-phthaldialde- 
hyde-amino-acid determination demonstrates that 
the peptides Tlj and TI4 have Lys-4 modified, 
with the sequence Ala-Thr-Val-Lys* (pos. 1-4, 
see Fig. 3), while peptide TLj contains an un- 
modified lysine residues with the sequence Ala- 
Thr-Val-Lys. Peptide TI3 is a mixture of the 
peptides Tl| and Tlj. Peptide T2, Phe-Lys* (pos. 
5-6, see Fig. 3) is found in one position only. The 
degree of methylation, derived from the sequence 



of the intact protein and estimated by peak height, 
is approx. 90% for Lys-4 and 83% for Lys-6. 

The appearance of peptide T7 (pos. 28-39), 
which does not possess modified lysines, at two 
different positions may be due to partial oxidation * 
of methionine. The degree of modification at Lys- 
60 appears to be the crucial factor for the dution 
of peptide TIO (pos. 52-60) at different positions. 
Amino-add analysis of this peptide has shown 
that peptides TlOj and TlOj differ only at Lys-60, 
namely T10| contains unmodified lysine, while 
Lys-60 in TIO2 is monomethylated. 

C-terminal peptide regions 

The peptides produced after CNBr cleavage 
were sq>arated by RP-HPLC either on a Vydac C4 
or Cjg column as described in Experimental pro- 
cedures. The C-terminal peptide (CB3) (pos. 
58-63) was isolated by using the Vydac C]g col- 
umn and the homoserine peptides CBl (pos. 1-28) 
and CB2 (pos. 29-57) by a Vydac C4 column. 
From the sequence determination and amino-add 
analysis (Fig. 4c) of CB3, the following primaiy 
structiue was derived: 58-Leu-Glu-Lys*-Gln- 
Lys*-Lys*-63. The degree of monomethylation, as 
estimated by peak height, is approx. 90%, 100% 
and 58% for lysine residues 60, 62 and 63, respec- 
tively. The number of lysine residues in the C- 
terminal peptide was substantiated by fast atom 
bombardment mass spectrometry [30]. 




froctions 

Fig. 5. Sq)arauon of Uie 20 nmol peptides derived by tryptic digestion of protein Sso7d by HPLC The peptides ^ 
chromatographed on a Vydac C,s (201 TPB) column (250x4 mm) in a solvent system of 0.1% irifluoroacctic acid/accionitriJc. 
gradient applied was 100% A for 10 niin. 0-50% B in 180 min, 50-100% B in 20 min, 100% B for 5 min and 100-0% B in ^ ^ 
Measurements were made at 220 nm, 0.16 arbitrary units (full scale), at a flow rate of 1.0 ml/min. 
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corresponding Sac7d protein showed no modified 
lysine residues in the C-tominal sequence region. 

Secondary structure predictions 

Information about the secondary structure of 
protein 7d has been predicted based on the 
amino-acid sequence. Four different prediction 
methods according to Ref. 31 were used to calcu- 
late the conformational states (Fig. 6). This pro- 
tein possesses a higher amount of a-helical do- 
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6. Secondary structure of DNA^tniiduig proteins 7d from S, acidocoldorius and S. solfataricus as predicted by four different 

methods. The symbols represent residues in a-helical (AtW/), beta-sheet {^^)» ^-lums (rvru) and random coil ( ) 

formations. The line Avg summarizes the secondary structure obtained when at least three of the four predictions are in agreement. 
The amino-acid sequences of the proteins are shown at the bottom line in the one-letter code. Sch« method according to Buigess et at 
[33]; C&F. Chou and Passman [34]; Nag, Nagano |3S]; Rob. Robson and Suzuki [Sd]. 



Because of the mcthylation of the lysines found 
here in the 5. solfataricus 7d protein (Sso7d), the 
homologous 7d protein derived from S. acidocai- 
darius (Sac7d) was also examined for lysine mod- 
igcations not previously identified [14]. We rein- 
vestigated the Sac7d protein by hquid phase se- 
quencing and isolation of the C-terminal CNBr 
fragment, and found N'-monomethylated lysines 
ai positions 4 and 6 (approx. 20% and 50%, re- 
spectively). However, in contrast to Sso7d, the 
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Fig. 7. Structural homology between the 7d DNA-bindiiig protein from S. sdfataricus and the UNA-binding proteins 7a, 7b, 7c and 
7d from S. addocaldarius cells. The alignment scores (SD units) calculated by the program ALIGN [32) using the standard muUtiott 
data matrix (100 random runs and a break penalty of 20) are: 

7d S. sol/ataricus - 7a S. addocaldarius: 30.93. 7d 5. solfataricus - 7d S. addocaldarius: 32.63. 
7d 5. solfataricus - 7b 5. addocaldarius: 29.54. 7d S. solfataricus ~ 7c S. addocaldarius: 30.23. 

Gaps are shown as. . . . 



mains - about 35% - as compared to other 7 kDa 
DNA-binding proteins from S, addocaldarius for 
which only about 1S% helix content was calcu- 
lated. 

Homology to other DNA-binding proteins 

By sequence comparison^ we found a strong 
degree of homology between protein 7d from S, 
solfataricus and the proteins from the 7 kDa group 
from the archaebacterium 5. addocaldarius (Fig. 
7), using the programme ALIGN [32]. No signifi- 
cant homology between protein 7d from S, solfa- 
taricus and DNA-binding proteins from other 
organisms has been found. 
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The crystal structure of 
the hyperthermophile 
chromosomal protein Sso7d 
bound to DNA 



Sso7d and Sac7d are two small (-7,000 M^), but abundant, chro- 
mosomal proteins from the hyperthermopiulic archaeabacteria 
Stdfohbus so^ataricus and S, addocaldarius respectively. These 
protdns have high thermal, add and chemical stability. Tliey 
bind DNA without marked sequence preference and increase 
the T« of DNA by -40 Sso7d in complex with GTAATTAC 
and GCGT(«U)CGC + GCGAACGC was crystallized in different 



crystal lattices and the crystal structures were solved at higji res- 
olution. Sso7d binds in the minor groove of DNA and causes a 
single-step sharp kink in DNA (-60°) by the intercalation of the 
hydrophobic side chains of Val 26 and Met 29. The intercalation 
sites are different in the two complexes. Observations of this 
novel DNA binding mode in three independent crystal lattices 
indicate that it is not a function of crystal packing. 

How do sequence-general DNA binding proteins bind to DNA 
is a fundamental question for understanding genome structure. 
However, few examples of structures of sequence-general DNA 
binding proteins bound to DNA are known. The high thermal, 
acid and chemical stability associated with Sso7d and Sac7d' (Fig. 
Ifl) makes them an attractive system for struaural. thermody- 
namic and DNA-binding studies^-^ Sac7d and Sso7d have 
unfolding temperatures of greater than 90 <, (at pH 7.5, 03 M 
KQ) and both are acid stable with T„*s of >60 "C at pH 0. The 
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Fig. 1 a, Amrno add sequences of recombinant 
Sso7d and Sac7d. b. Ribbon diagram of the 
Sso7d-GCGT0U)CGC ♦ GCGAACGC complex. All side 
chains of Sso7d are shown. The four bridging water 
molecules are shown as large purple spheres. DNA b 
colored red for the first two base pairs and green the 
remaining six base pairs; separated tiy the intercalating 
amino acids (yellow), c Superposition of three Sso7d 
stmctures from the Sso7d-<3C6TCU)C6C + GCGAACGC 
complex (yellow), the Sac7d-GCGATCGC complex* 
(green) and the NMR solution structure^ (red). 
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EXHIBIT 3 




Ftg.2 a. Stereoscopic surface drawing of the elec- 
trostatic potential of the Sso7d-€CGTfU)CGC ♦ 
GCGAAC6C complex. The charge distribution of 
Sac7d was calculated m the absence of DNA. 
S$o7d b positively charged (+6). resulting from 14 
lysines, two arginines, seven glutamic adcb and 
three aspartic adds on the protein surface. 
However, the complexes are n^fatfvely-^iarged 
(•S) overall due to the additional 14 negative 
DMA phosphate charges. There b no apparent 
correlation between the nfK>nomethylation sites 
of lysines (Lys S and Lys 7) and the binding inter- 
face. Four bridging waters are found in the space 
between the protein and DNA. b. c Detailed 
views at the proteln-DNA interface of the 
S$o7d-GCGT0U)CGC ♦ GCGAACGC (left) and 
Sso7d-GTAATTAC (right) complexes. Selected 
side chains of S5o7d (red), tivee DNA base pairs 
(green) and four bridging water molecules Ojur- 
ple) are shown. 



solution structures of Sso7d^ and Sac7d^ determined by NMR 
anaiyses, are similar to each other and they consist of an incom- 
plete five-stranded P-barrel capped by an amphiphilic a-helix 
abuning the p3-P4-P5 strands. 

Both proteins bind to DNA without marked sequence prefer- 
ence and increase the T« of DNA by -40 However their 
DNA-binding mode has remained unclear until recently*. 
Baumann et aL proposed that the P3-p4-p5 sheet is the putative 
DNA binding su^face^ McAfee et al^ have shown that Sac7d 
binds to DNA with an average ratio of four base pairs per 
monomer of Sac7d vnth no cooperativily. Circular dichroism 
data also indicated that Sac7d induces a sequence-dependent 
cooperative structural transition in DNA. Another unusual prop- 
erty is the ribonuclease (RNase) activity associated with Sso7d, 
which has been called ribonuclease P2'**. However, similar studies 
on Sac7d did not produce conclusive evidence of any RNase activ- 
ity (unpublished results of J.W.S.). 

We recently determined the crystal stniaures of two 
Sac7<i-DNA complexes which revealed an unexpected DNA 
minor groove binding mode of Sac7d with the DNA duplex 
sharply kinked^ Here we present the results of a parallel study on 
the structure determination of two Sso7d-DNA complexes. The 
complexes were crystallized in two new crystal lattices which 
afford us an exceDent opportunity to compare the structure and 
DNA binding properties of not only the same protein (Sso7d) in 
different environments, but also different proteins (that is, Sso7d 
versus Sac7d), The structures are also compared with a recent 
Sso7d-DNA structure by NMR analysis* 

Overall structure of the complex 

The crystal structures of two Sso7d-DNA complexes, 
Sso7d-GCGT(aj)CGC + GCGAACGC ('U, S-iodo-deoxyuridine) 



and Sso7d-(3TAATTAC have been solved and well-refined at high 
resolution (TaUe 1 ). All <^ angles of the Ramadiandran plot and 
other conformational parameters in both complexes M within the 
acceptable regions. The Sso7d binding sites in DNA arc sharply 
kinked and located at different places in the two complexes: at the 
C2pG3 step m the S5o7d-^CCrr('U)CGC + GCGAA(XC com- 
plex (Fig. lb) and at the A3pA4 step in the Sso7d-GTAATTAC 
complex respectively. The protein covers four bases and signifi- 
cantly widens the DNA minor groove. The other end of the DNA 
duplex remains B-DNA-like. These two complexes have diffeienl 
crystal packing interactions, indicating that the observed novel 
DNA binding mode is not a result of crystal packing and is an 
accurate reflection of the preferred protein-DNA interaction. 

The structures of the bound Sso7d in both complexes are very 
similar to each other with an r.m.s.d. of 0.51 A (using c::a atoms of 
residues 2-60) and are generally similar to that of the free Sso7d 
determined by 2D-NMR analysis* with an r.m.s.d. of -2.10 A 
(using Ca atoms of residues 2-60). Some differences exist in the 
orientation of the Pl-p2 p-hairpin and in the conformations of 
the C-terminal a-helix (Fig. Ic). 

The molecular surface of Sso7d is irregular with numerous 
ridges and valleys (Fig. 2a), The excellent matching of shapes and 
charges between Sso7d and DNA in the complexes is evident. A 
long groove is visible which is occupied by DNA in the complex. 
There is also a sigmficant crater created by the crossing of the P3- 
P4-p5 triple stranded P-shect and the Pl-P2 P-hairpin. 

Sso7d has a OB-fold topology»^ wirii a small hydrophobic core 
of only 11 residues (<25% sohreni accessibility). Four glycines 
(Gly 10, 27, 38 and 39) are located in the loop regions. Many 
hydrophobic amino adds are soNent exposed (>45% solvent 
accessibility). The surface hydrophobic amino acids Trp 24, Val 
26, Met 29, and Ala 45 are involved in DNA binding contacts. 
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Fig. 3 a. Detailed local structures 
at the protein-ONA interface 
of the S$o7d- GCGT(«U)CGC ♦ 
GC6AACGC complex. Selected 
side chains of Sso7d are shown. 
6. Schematic diagram summariz- 
ing alt the important 5so7d-DNA 
contacU. The filled, open and 
dashed arrows represent direct 
hydrogen bonds/salt bridges, van 
der Waals dose contacts, and 
potential hydrogen bonds^K 
bridges respectively. 



There are two 3,o-tums that allow the protein s main chain to 
change direction abruptly. The C-tcrminal helix is solvent 
exposed. A notable feature is the triple-stranded P-sheet (P3-P4- 
PSl whose interactions with DNA are sununarized in Fig. 3. 

Bound DNA has a sharp kmk 

The DNA is severely kinked (Fig. 4) by the bound Sso7d, as in the 
Sac7d-DNA structures'. This type of DNA kink has been observed 
in the complexes of TBP'^» and IE¥-V\ SRY'* (two HMG-box 
containing proteins) with their cognate specific DNA sequences, 
but different from that from proteins diat bend DNA more 
smoothly*'. The induced local DNA deformation is similar among 
different protm-DNA complexes, despite the different protein 
motifs. It should be noted that the -61* single step kink in the 
Sso7d-DNA and Sac7d-DNA complexes is the largest among all 
known structures of protein-DNA complexes. The solution struc- 
ture of the Sso7d-CrAGCGCGCrAG complex has been analyzed 
NMR recently'' and the DNA was found to be bent by 30°, signifi- 
cand)- lower than that found in the crystal structures. The differ- 
ence may be the result of the NMR refinement using limited 
number of observed NOE crosspeaks between Sso7d and DNA due 
to the fest exchange between the free and bound DNA/protein. 

The bound DNA has a varying degree of helix unwinding at steps 
surrounding the intercalation sites (-14« at C2pG3, -14» at G3pA4 




and -12- at T4piU5). There is also a slight roll (1 1«) between die 
G3-G14 and A4->T13 base pairs, thus creating a total bend of 72* 
Many nucleotides surrounding the wedge site adopt the less-com- 
mon C3'*endo (N-type) sugar puckers: C2 (N), G3 (S), T4 (N) and 
•U5 (N) in one strand and G15 (S). C14 (N), A13 (N) and A12 (S) 
m the other strand, The Sso7d-GTAArrAC complex has the same 
pattern. 

The DNA distortion seen in the complex described here most 
likely represents the structural transition observed by McAfee et ai^ 
using CD spectroscopy for the Sac7d system and the large heat 
capacity change upon DNA binding observed by Lundback etoL*. 
Such a structural transition (unwinding and/or bending) is 
induced in DNA by Sac7d which is cooperative' in the sense diat it 
is necessary to have two proteins bound within a specified distance 
(for example, 5 base pairs in duplex poly(dA-<IT)) before the tran- 
sition occurs. The inherent resistance to the transition is apparent- 
ly negligible in short DNA sequences. Our preliminary ID-NMR 
titration of Sac7d to cisplatin-lesioned DNA indicates a tight bind- 
ing between Sac7d and die pre-kinkcd DNA. supporting die novel 
bmding mode observed in the crystal structures (unpublished 
data). 

Protein-DNA interface 

The binding of Sso7d to the minor groove of DNA involves a large 



Rg. 4 Stereoscopic view of the interca- 
lation sites. The local structures of the 
two Sso-DNA complexes are superim- 
posed. The DNA octamer i$ kinked 
ei" at the apG3 step in the 
S$o7d-GCGT0U)CGC + 6C6AACGC 
complex and 62» at the A3-A4 step in 
the Sso7d-GTAATrAC complex. The 
sharp kink r$ due to the intercalation of 
Val 26 and Met 29 amino acid side 
chains into DNA txase pairs from the 
minor groove direction, widening the 
minor groove at thb step. The inser- 
tions of P4-Met 29 and P3-Val 26 amino 
add side chains are -13 A deep. The 
side chain of Met 29 lies close to the 
base pair with the S-CH, moiety 
wedged between the C14 and 615 
bases. Similarly the side chain of 26 
i$ wedged between the C2 and G3 
bases, with each of the 6CH, groups 
pointing toward a base. 
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binding surface area of about 20 A x 
20 A (Fig. 2a). A significant compo- 
nent of the free energy of binding is 
due to non-electrostatic interactions, 
made in large part by Trp 24, Val 26 
and Met 29 (Fig. 4). In addition to the 
obvious role of the p4-Met 29 and p3- 
Val 26 amino acids, the single trypto- 
phan lP3-Trp 24) plays multiple roles. 
First its bulky ring fills up die space 
between DNA and Sso7d. Second its 
indole NH group fonns a specific 
hydrogen bond (2.93 A) to the N3 of 
the G3 base, anchoring G3 in its open 
(unstacked) position. Ala 45 also 
makes a dose van der Waals contact 
vsrith the deoxyribose of Cl4, suggest- 
ing the requirement of a small 
hydrophobic side chain of alanine at 
position 45. Ser 3 1 receives a hydrogen 
bond (2.87 A) from the N2 amino 
group of die G3 nucleotide. 
Interestingly, in die Sso7d-GTAAr- 
TAC complex Ser3] forms a hydrogen 
bond to the suliiir atom of Met 29. 



Refinement data 



Oystal data 

a (A) 47.60 47.52 47.78 46.87 

6(A) 50.77 50.76 50.91 49.67 

c(A) 42.06 41.97 42.03 37.65 

Resolution (A) 2.0 2.0 2.0 1.7 

# reflections <>1.0o(F)) 7,607 7,499 7.669 11,959 

Temperature K) 20 20 20 -150 

R,nec5*<%) 7.53 6.37 7.33 7.37 

Completeness (%) 94.1 92*9 95.7 84.3 
Completeness at highest 

shell for >2.0 0(F) (%) 83.0(2.0-2.1 A) 90.6 (1.70-1 .78 A) 

Wilson B-f actor (A^ 32.6 29.7 32.1 17.8 
Mean overall 

figure of merit 0.83 



# reflection (>2.0a(I)) 5.682 - 9.488 

R,«rtjn9/^ef 00%data) 0.168/0.268 0.203/0.283 

R.m.s.d. bond distance (A) 0.010/0.007 0.014/0.009 
(5so7d/DNA) 

R.m.s.d. bond angle (') 1,37/1.20 1.81/1,34 

(Sso7d/DNA) 

No. of atoms 



Table 1 0ystal and refinement data of two Sso7d«DNA complexes 




The guanidinium group of Arg 43 
is hydrogen bonded to the 02 atom 
of 1)5. Arg 43 is held in its place with 
the aid of Tyr 8 whose aromatic ring 




No. of waters 99 114 



is stacked on the deoxyribose of A 13. 
The phenolic OH group of Tyr 8 is linked to the N3 of A13 between protein and DNA in defining die sequence specificity in 
through ^ bridging water. The hydrogen bond between Arg 43 and the Trp repressor-DNA recognition", 
the 02 atom of a thymine appears to be important and may deter- 
mine die polarity of the Sso7d binding mode. The structure of die Biological implication 

Sso7d-GCGT('U)CGC + GCGAACGC complex shows dxat die The structures of die Sso7d-DNA and Sac7d-DNA complexes 

Arg 43 of Sso7d is hydrogen bonded to »U5 of die TT-strand, not offer new insights into die possible role of several classes of DNA 

to die AA strand. Therefore a combination of the specific interac- binding proteins in transcription regulation. Some of those pro- 

tion between a guanine base and Ser 31 , and the hydrogen bond teins» including TBP'^ '*, SRY^\ LEVI « and PurR". bind in die 

between Arg 43 and *U5-02, may be important in favoring die minor groove of DNA and kink die DNA duplex to a different 

intercalation site at the C2pG3 step in this complex. degree''. Additionally we noted a possible strurtural alignment 

The formation of the complex is accomplished by specific between Sso7d/Sac7d and die cold shock proteins 

hj-drogen bonds/salt bridges (Fig. 3). The number of salt bridges OpA/CspB^'-^i. Both CspA and CspB are related to a class of pro- 

between the protein and DNA is in excellent agreement widi die teins called Y«box proteins, which have a wide-spread and highly 

five ionic interactions predicted by the salt back-titrations of die conserved nucleic acid-binding motif occurring from baaeria to 

Sac7dcomplex^usinglhetheoryofdeHasedief fli.'«. Asome- humans". Therefore diis structural alignment between 

what smaller value has been determined by salt-dependent Sso7dySac7d and CspA/CspB may be significant in understand- 

isothermal titration calorimetry on the binding between Sso7d ing the Y-box proteins. 

and poly(dG-^C)^ The new DNA binding mode of Sso7d/Sac7d may also offer a 
An important question is how do Sso7d and Sac7d bind to clue for understanding die packaging of DNA in archaeabacteria. 
DNA in a sequence-general manner. The answer may lie in the Several models of die polymeric Sso7d-DNA complex widi dif- 
bridging water molecules found in die buried cavity located ferent protein/DNA ratios can be constructed by using the struc- 
betwecn protein and DNA (Fig. 2b,c), This cavity permits the ture of die complex observed in the crystals. Previously we 
G-C base pairs to be bound without steric clash due to die addi- presented a model in which die DNA is maximally-loaded widi 
tional N2 amino groups, thus endowing the protein with a prop- Sac7d proteins*. Additional modeling studies showed diat if the 
erty required for its sequence-general binding to DNA. For number of base pairs per protein monomer is increased (for 
example, in the Sso7d-GCGGTCGC + GCGACCGC complex example, to 10 base pairs per protein), many possibilities for 
(which has a G-<: base pair, instead of an A-T base pair, at the DNA condensation may exist (data not shown) 
tourdi position in die sequence), we observed fewer intervening Our study augments the understanding of chromatin stnic- 
water molecules widi a concomitant movement of DNA base ture in achaea. On the one hand, histones^* or histone-like pro- 
pairs (unpublished results). It is interesdng to note that bridging teins (for example, HM()^ form nuclcosomes. On die odier 
water molecules play an important role in modulating die hand, Sso7d/Sac7d may bind to DNA in the minor groove and 
sequence-general binding of Sac7d and Sso7d by aaing as filler, form higher ordered structures. Thus two different types of DNA 
whereas they play an entirely different role as specific linkers compaction mechanisms may be possible in die Archaea: die 
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mechanism described here with Sac7d/Sso7d which may be rep- 
resentative of the Crcnarchaeota, and a nudeosome-Iike struc- 
ture for the HMf- class of proteins found in die Euryaicfaaeota^. 

Interesting!); the bacteria] HU protdn has a different way of 
forming chomatin struaure. TTie crystal struaure of the complex 
between die integration host factor (IHF) and DMA revealed diat 
IHF induces two prominent kinks in die bound DNA, forming a U- 
tum», by the partial intercalation of a proline from each of die two 
long P-hairpins which wrap around the DMA. The sequence and 
structural homology between IHF and HU suggest diat HU may 
organize chromatin using a minor-groove binding mode dirough 
intercalation. 

Methods 

The purified S$o7d protein' was diatyzed against de-ionized water and 
lyophilized. The complexes were crystallized using the vapor difhjsion 
method. The S$o7d-^AATTAC complex and the two iodo derivatives 
were orystallized from 1 .3 mM Sso7d. 1 3 mM DNA duplex. 2 mM Tris O 
buffer (pH 6.5). 2.6% PEG400 solution, equilibrated with 1 5% PEG400. 
The Sso7d-<GCGTTCGC + GCGAACGQ and iodo complexes were oys- 
tailized under similar conditfons except 5% 2-methyl-2.4-pentanediol 
(MPD) solution was used and the solution equilibrated with 20% MPD. 
Data were collected either at room temperature (20 or at -1 50 «< on 
a Rigaku R-Axis lie image plate area detector system to various resolu- 
tion ranges fTable 1). The crystals of both complexes are In the space 
group P2,2,2,. The data were processed using the software provided by 
Molecular Structure Corporation. 

The phases were determined by the multiple isomorphous replace- 
ment (MIR) method using two iodo derivatives (denoted as kiU-02 and 
l-dlW)5 with the iodine located at positions T2 and T5 respectively) for 
the Sso7d-GTAATTAC complex. The figure-of-merit weighted MIR map 
with solvent flattening at 2.5 A resolution clearly revealed both the 
DNA and the S$o7d protein electron density. At that point the refined 
structure of die Sac7d-<STAATTAC complex* was used to model the 
Sso7d-GTAATTAC complex into die MIR electron density. The model 
was appropriately corrected against the un-biased map. The structure 
was refined by the simulated annealing (SA) procedure incorporated in 
X-PLOR» using the data with |FJ > 4o(F) in the 6.0-1.9 A range. 
Simulated annealing and Individual temperature factor refinements 
were carried out by X-PLOR. Well-ordered water molecules were locat- 
ed and gradually included in the nrodel. 

Crystals of the complex between Sso7d and 6CGTTCGC + 
GCGAACGC and the iodo-dU derivatives were obtained. It was found 
tfiat die sequence GCGT(iU)CGC + GCGAACGC produced the best crys- 
tals and a 1.6 A resolution data set was collected at -150 K. The struc- 
ture of the complex was solved by the molecular replacement mediod 
using die AMORE padcage in the CCP4 suite» Similar SA refinement 
was earned out with a final R-factor (woricing set) of 20.3% usinq the 
|FJ >4o(F)data in the 6.0-1.6 A range. 

Programs O", MIDAS Plus (University of California at San Francisco) 
and QUANTA (version 4.0, Molecular Simulattea Burlingtoa MA) were 
used to examine the elecUon density maps and molecular models. The 
electrostatic potential diagram was calculated by GRASP^. DNA force 
field parameters of Parkinson et a/." were used. All structures have 
been refined by SA and individual B-fartor refinement in X-PLOR. 
Dunng the refinement some rebuilding of the model was necessary to 
improve the fitting of the model to the electron density. The crystal 
data and refinement summaries are listed in Table 1. 



Coordinates. The atomic coordinates of the two Sso7d-C>NA com- 
plexes have been deposited in the Brookhaven Protein Data Bank 
(accession numbers 1BNZ and 1BF4 respectively). 
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Solution structure and DNA-binding 
properties of a tliermostable 
protein from the archaeon 
Sulfolobus solfataricus 



Herbert Baumann, Stefan Knapp, Thomas Lundback, Rudolf Ladenstein and 
Torleif Hard 

The archaeon Sulfolobus solfataricus expresses large amounts of a small basic 
protein, Sso7d. which was previously identified as a DNA-binding protein possibly 
involved in compaction of DNA. We have determined the solution structure of 
Sso7d. The protein consists of a triple-stranded anti-parallel ^-sheet onto which an 
orthogonal double-stranded p-sheet is packed. This topology is very similar to that 
found in eukaryotic Src homology-3 (SH3) domains. Sso7d binds strongly (K^ < 10 
\iM) to double-stranded DNA and protects it from thermal denaturation. In 
addition, we note that E-mono-methylation of lysine side chains of Sso7d is 
governed by ceil growth temperatures, suggesting that methylation is related to 
the heat-shock response. 
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l>XA in a laiuloin coil Ci)nformali<>ii occupies a volume 
llval is ahiiosi always much lart;cr lhan the cell In which 
I he molecules are conlaineiL Thus, ihe DNA of all cells 
nuisi he slruclurally ori;ani/.etl in a compact lorm and 
yei be readily available Tor transcription. In the nucleus 
ol ihe eukaryotic cells the i;eniimic DNA is packed by hi- 
slone proteins into iuicleos«)mos, which in turn lorm the 
hiiiher-order structures of'chromalin'. The structural or- 
i;nni/ali(»n iH DNA in prokaryots is somewhat less well 
understood'. Archaea and bacteria contain abundant 
small, basic pnHeins which arc believed to be involved in 
packinj: and unpackinij, nuiinicnancc and control t>rihe 
i;enomic I )NA (see rel's 2-5 tor reviews)-— one of the best 
characterised being the HU protein from Escherichia coli 
Some of these pr<»ieins are also clearly evolulionarv re- 
Inted to eukaryotic histones (ret*. 6 and work cited 
therein). Others are believed to have more specialized 
lunciions, such as to bend the DNA at specific sites 

I he thermoacidophilic archaeon Siilfohbiis, which can 
be isolated (Vom volcanic hot springs', expresses several 
small, basic proteins. These proteins were first reported 
by I homm ct ni ( ref. tS) and were subsequently isolated, 
characterized and sequenced by Rciiihardt and co- work- 
ers ' The basic proteins isolated from Sttlfolohus 
ticiiloctililtiriu:>CAn be grouped into three molecular weight 
classes 7,000, 8,000 and 10,000 M/(7, 8 and 10 ko'a), 
respectively The 7 kDa proteins can be further sepa- 
rated according to their basicity. Sequences are known 
for the major component of the 7 kDa class in S. 



<olhttnricu< (Sso7d)'' and the correspt>iidini; protein 
(Sac7d ) as well as three minor components (Sac7a, Sac7b, 
ami Sac7e) in S. iiciil(Htihl,iriti<:'^\ The sequences of these 
proteins are compared in l ig. Ii/. I he |>roieins are all 
very rich in lysine residues— I I residues out of 63 in 
S.so7d are lysines. Lysine resiiUies ai the .miino-and 
carboxy termini (residues 1. (>. M), hi and in Sso7d) 
are subjected li> i-mono-methylaiion within the cell'"'. 

rhe function of the 7 kDa class of proteins in 
Sitlfohhtis is not known. The initial reports emphasize 
their DNA-binding properties. The proteins are small, 
basic and abundant, that is 'histone-like'. l iller-binding 
assays show that Sso7d binds DNA at phvsiological salt 
concent rations and elect ron fnicrographs reveal the for- 
mation of compacted nucleoprotein particles with both 
double-.siranded (ds) and single-stranded (s-l DNA'-. 
The influence of sequence specificity on .Sso7d binding 
to dsDNA has not been investigated. The limciional sig- 
nificance of e-mono-methylation of lysines or the effect 
of various degrees of methylation on the DNA-binding 
properties are unclear. 

The Sso7/Sac7 class of proteins may also have other 
functions in addition to DNA binding. For instance, the 
protein contains the sequence GGC.K'ICiRG (Fig. 
which is reminiscent of the 'P-loops* found in several 
classes of ATP- and GTP-binding proteins' ', and might 
iheretbre be a phosphate binding site'*. 

A protein in 5. solfntaricus, which appears to be iden- 
tical to the previously identified Sso7d, has been sug- 
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Fig. 1 a. Aligned sequences of proteins of basic 7 kDa, 
proteins from 5. Solfataricus (Sso7d) and 5. acidocafdahus 
(Sac7a,b,d,e)" The numbering refers to Sso7d. Stars 
indicate lysine residues subjected tOE-mono-methylation. 
The putative phosphate/nucleotide binding site in Sso7d 
has been boxed. Residue 13 in Sso7d has been changed 
from Glu'^ to Gin based on NMR data, b, Mass spectra of 
Sso7d from cultures grown at 75°C and 88°C. The numbers 
indicate calculated masses for the various species. The 
expected mass for the non-methylated species calculated 
from the sequence is 7147.2 au. 



i»eslcd \o iici iis A rihonnclcnsc \\\W\\ wiih ii rnihcr nar- 
row subslmk- specific ily' The protein — c;ilicti p2 hv I usi 
ci tii^\ who coniparo p2 u» SacJd^ Inn .seem to have been 
iiniuvarc oflhc piiblishoii Sso7il .sei|uence' — is roporlotl 
U) bo dimoric uiuler native conilitituis. This t>hservalion 
is in contrast to other Jata, which clearly show thai Sso7d 
is mononieric (ref. 12. present work). 

I'he abundance or Sso7il in S. <offtittirun<, in ctimbi- 
nation with its relatively snwll size, solubility, thermo- 
stability, and easeofpurillcatii)!) makes the protein suit- 
able lor biophysical analyses and structure deiermina- 
lion. We have initialed a series of slu<lies to determine 
the structure ami dynamics of l he Ss«»7/Sac7 class of pn)- 
tcins, their nucleic acid--l>inilinv: airmitics and specifici- 
ties, as well as possible nucleotitle hindint;/hyilrolysis. 
In the present work we rej>ort «>n the structure cif Sso7d 
in solutit)n and proviilea more tlelalled characieri/.ati«)n 
of its DNA-bindini; properties. When analyzint; the 
structure of Sso7d we made liie int rit^uinii observation 
that this abundant archaeal pr*)tein in fact is structur- 
ally similar to that ol'Sll.^ ilomains involveil in sii»nal 
transduction in eukaryole. We also n<)te thai the extent 
t'-mon<)-n)elhylalion ot* lysine residues in S.si»7ii de|>ends 
on cell culture i;rowth temperature, sui;i:cslini» that the 
iiicthylation is a response to heal shock. 

Purification and initial characterization 

Sso7d was purified from .V >iolhitaricn< (Methods); the 
protein eluted in two peaks Irom the Nh)no S column 
used in the final purification step. Mass spectrt>mctric 
analysis of (pooled) material from the two peaks indi- 
cate the presence of six masses ( Fig. 1 h). Mass differences 
correspond to .sequential substitutions of hydrogen at- 
oms with methyl groups, as a result of the £-mono-me- 
ihylation of lysine residues described previously'"-. The 
observation of six peaks with different mcthylation pat- 
terns is consistent with the notion that five lysine resi- 
dues are subjected to methylation. The mass of the spe- 
cies with the lowest molecular weight corresponds with 
that calculated from the sequence ( Fig. I n). 

Sso7d from the two fractions show NMR chemical 
shift differences of 0.02-0.12 p.p.m. affecting backbone 
resonances of residues 2, 3, 6, II, 12> 16. 17 and 44, but 
connectivities involving these residues observed in 2D 
NOESY spectra are practically identical for material from 
the two fractions. The chemical shift differences are most 
likely caused by electrostatic effects due to melhviation 
of one of the lysine residues, because differences in 
chemical composition can be ruled out based on mass 
spectrometry. The presence of two exchanging conform- 
ers can also be ruled out because NOESY spectra re- 
corded on the two fsenarated) sneries Unmnl^c 7 :>nH '^ 
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see Methods) do not change within a period of seve I 
months. 

The extent of e-mono-methylation of lysine sid 
chains varies with bacterial growth conditions so tha^ 
higher growth temperatures lead to more extensive me 
thyiaiion (Fig.lt). The physiological relevance of tht 
effect is not dear. It is possible that the lysine methyla^ 
tion is directly related to the stability of the protehi 
and/or the DNA-protein complex and the response of 
the organism to heat shock. The pK^ of the lysine side 
chain is affected very little by methylation"*and it seems 
less likely that methylation has a direct affect on DNA- 
binding affinity. 

Sso7d binds strongly to dsDNA 

The equilibrium binding ofSsoyd to various polynucle- 
otides was studied by monitoring changes in the intrin- 
sic tryptophan tluorcsccncc on formation of the com- 
picxcs. The fluorescence of Trp 23» excited m 290 nm, is 
quenched by 60-90% on binding and the emission spec- 
irum is shifted to longer wavelengths (not shown).The 
results of titrations performed al low salt (buffer D) and 
physiological salt concentration (buffer C) conditions, 
respectively, are shown in l-ig. IhJk riiraiion curves for 
lour different dsDNA polymers with nllcrnaling purine- 
pyrimidine sequences at low sah. show an observed 
quenching, () , . wiiich levels out ;j| (^)^ .^0.9. jhere is 
little difference in the appareiU binding affinity to the 
various dsDNAs al low salt, presumably iUk- lo quanti- 
tative binding lo all DNAs. I lu- biiuling curves show 
saturation al an approximale conccnlralion ratio of 1:6 
proicin:l)NA base pairs (bp), which can be taken as an 
eslimale of the lower limil for the Ss«>7d binding site 
densily on DNA. 

There is a definite difference bel ween ihe Sso7d bind- 
ing affinities lo various ilsl )NA sequences al physiologi- 
cal sail concentrations ( l ig. :/;). I he binding is stron- 
gest lo polyldldC) and pi>ly(dAilll ), for which the af- 
liniiies are approximately ei|ual. The DNA loncentra- 
lional halfsaluratitm is in ihiscaseappmximately S^lM 
h\\ This number corresponds to an alllniiy constant of 
~0.5-lxl(r (M sites on DNA) ' if one (conservatively) 
assumes that ihe maxinunn binding site density is in the 
range l:(>-l:3 proieinrDNA bp. Binding lo polv{dGdC) 
is somewhat weaker and binding to poiy(dAdT) is about 
5-10 limes weaker than thai lo polv(dAdU) and 
polyldldc:). 

The binding affinities o\ Sst)7il to vari*»us alternating 
dsDNA sequences can be rationalized as follows. First.a 

Fig. 2 Analysis of DNA binding by Sso7d. a. Equilibrium titrations of Sso7d with various polynucleotides and 
monodmucleosides based on fractional fluorescence quenching (Q^J. The titrations are performed at low salt 
concentration (buff er D) as reverse titrations in which the protein concentration is kept constant (2uM) b Equilibrium 
titrations performed at a higher salt concentration, which is closer to physiological conditions (buffer C) with 1 ]it^ 
^'?^f-i a\ ^ *IL5wr^x'' titrations with poly(dGdC) (l). poly(dAdT) (T), poly(dldC) (•). poly(dAdU) (A), 

poly(dA) (c^) poly(dC) (□). poly(rA) (,), poly(rC) (+), dATP (ffl)and dOP ( a ). The abscissa legends indicate that 
concentrations of double-stranded DNAs are measured in base pairs and concentrations of single-stranded 
polynucleotides and monodmucleosides are measured in bases, c Thermal denaturation profiles of poly(dldC) in the 
c ^f!I'^Ml? u^T-l^"^^ of bound Sso7d; no added protein {:), Sso7d added to a concemration corresponding to 1:15 
55o7d:DNA bp (u). and Sso7d added to a concentration corresponding to 1:3.6 Sso7d DNA bp (0.) The poly(dldO 
concentration was 12 pM bp. The thermal denaturation experiments were performed at low salt concentration 
conditions (buffer E). 
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Fig. 3 a. Two-dimensional 500 MHz NOESY spectrum of 
Sso7d (concentration -2.5 mM in 90%: 10% Hp:D O). b. 
Schematic view of the two antiparallel p-sheets In Sso7d. 
Hydrogen bonds used In the SA simulations and observed 
NOEs are indicated with dashed lines and arrows, 
respectively. Additional hydrogen bonds, not used in SA^ 



methyl group at position 5 (in the major t^roovc) of the 
pyrimidine is unravourahic tor binding^ This is clear 
when comparing; bindinj; lo poly{dAdU) and 
poiy(dAdT). Thus, i:)NA-protein interactions may oc- 
cur within the !)NA major groove. Second, binding lo 
dsDNA sequences with two inter-strand hvdrogen bonds 
is stronger lhan lo those with three hydrogen*' bonds in 
polymers lacking the pyrimidine meihvl (that is, when 
comparing poly(dAdU) and poiyidklC) to polv(dGdC)). 
This behaviour might be relaleil to some physical prop- 
erty such as flexibility, considering that Sso7d seems lo 
induce condensal it>n of I )N A * . 

Titration curves lor Sso7d binding to ssDNA and 
ssKNA homopolymers in the presence of low salt con- 
cenlraiit)ns show saluratitm at =(K6_().7. '\ \^^ \i\\M\' 
ing to ssDNA and ssUNA under lliese conditions appear 
to be weaker lhan that u» dsDNA, although there is a 
possibility that these complexes are as strong as those 
with dsDNA but that the maximum binding-site den- 
sity is K>wer. I lowever, the thermal denaluralion studies 
described below indicate that ilsDNA is prel'erred over 
ssDNA, because the melting temperalure increases on 
formation of the complex. I urthernn)re, increasing the 
sah concentrations lo physiological levels has a dramatic 
ellect on the binding lo single-stranded pt)lynucleolides 
( rig. 2/;). Under these condii ions there is only very weak 
binding lo poly(dA) and pi>iy(dc:), whereas no binding 
to poly(rA) and polytrC) can be delected at polymer 
concentrations <!()() pM bases. Thus, there seems to be 
a large binding preference for dsDNA compared lo 
ssDNA and .ssRNA at higher .sail concentration condi- 
tions. 

At low salt concentralit)ns it is also po.ssible lo moni- 
tor binding of the monodeoxymiclet)sides dATP and 
dCTP through the quenching of I'rp 2.^ Iluore.scence ( Fig. 
2rt). The titration curves do not show saturation and it 
is difficult lo estimate stoichiometries and aflinities based 
on the present data, but the binding seems \k\ be weaker 
than that of the ONA and RNA polymers. 

Protection of DNA from denaturation 

Thermal denaturation profiles of double-stranded 
poly(dldC) in the absence and presence of bound Sso7d 
are shown in Fig. 2f. Poly(dldC) is thermally unstable 
above 32"C at the conditions used in the experiment 
shown in Fig. 2c. Addition of less lhan stoichiometric 
amounts of Sso7d increases the thermal stability of 
poly(dldC) yielding a biphasic DNA melting curve. Satu- 
ration of poly(dldC) with bound Sso7d again results in 
a single phase denaturation profile with a melting tem- 
perature of about 70"C. Thus, binding of Sso7d increases 
the melting temperature of poly(dldC) by more than 
38'Cat low salt concentrations. Similar, albeit somewhat 
attenuated, effects can be ob.served with shorter DNA 
oligomers at physiological salt concentrations (data not 
shown). It is difficult to quantify the effect of Sso7d bind- 
ing to DNA polymers at high salt concentrations be- 
cause melting temperatures are high even in the absence 
of bound protein. However, it seems possible that Sso7d 
binding may shift the melting temperature of DNA above 
that of the boiling point of water. 



The remarkable effect of Sso7d binding on DNA ther- 
mal stability is very similar to that of the HTa protein 
from Thermoplasma acidophilum^'. Stein and Searcy'^ 
argue that the HTa protein may act to protect bacterial 
DNA during short periods of denaturing conditions al- 
lowing the organism to cope with transient periods of 
high lemperatures.The Sso7d protein may function in a 
similar manner in Sulfolobus. The different extent of 
lysine methylation of proteins expressed at different 
growth temperatures may also relate to the bacterial re- 
sponse to heat shock and stabilization of functionally 
important proteins. However, the effect of Sso7d me- 
thylation on its DNA-stabilizing properties are un- 
known. 

NMR structure determination 

Two-dimensional NMR spectra of Sso7d were recorded 
at 500 ;ind 600 MHz. The 'H spectrum (Fig 3fl) shows a 
very favourable resonance dispersion and could be al- 
most complcicly assigned using standard methodolo- 
gies''"'. Upon assigning the sequence we found one dis- 
agreement wilh the published sequence: residue I3> 
which is a ( ilu in the sequence of Choli ct aV\ is in fad 
a Clin and this correction has been made in Fig. In. The 
'I I linewidths in Sso7d (3-8 Hz) are typical for a protein 
wilh ;i rclalivc molecular mass of 7,00()» indicating that 
Sso7il is prodomiiianlly monoineric under the condi- 
lions used in the NMk cxpcrimcnls 

I'he NOI-SV spcclnnn ol Sso7d contains stretches i)l 
very strong sequential '/,v.,;m) NOH connectivities in 
combination wilh slnmg long range and '/^vdj.n 
NC)F.s> whicli arc typical lor (i-sheel secondary struc- 
lines*\ These arise fri>m one douhle-slranded and one 
triple-siranded anti-parallel [i-sheel ( Fig. 3/^). The pat- 
tern of intra- and inter-residue N(^l: connectivities, the 



observation of slowly exchanging backbone amide pro 
tons and low amide temperature coefficients allowed the 
identification of 14 intramolecular backbone-backbone 
amide hydrogen bonds within the anti*parallei ^-sheets 
(Fig. lb). 

The three-dimensional structure of a fragment con- 
taining residues 1-62 of Sso7d was calculated using a 
dynamic simulated annealing (SA) protocol with 61* 
non-redundant NOE distance constraints, 11 j^' dihe- 
drai-angle constraints and 28 hydrogen bond distance 
constraints (two constraints per hydrogen bond), that 
is 10,6 constraints per residue. The NOE distances (d i 
were distributed as 233 intraresidue (i=j), 151 sequent 
tial (li-jl=l),51 medium range (2<|i-j|<4),and 182 lon^ 
range (li-jl>5) NOEs (Table 1 ). The quality of the com^ 
puted SA structures is good as judged from the low 
Lennard-)ones potential energies and the very small av- 
erage deviations from idealized geometries. The distance 
constraint violation statistics are also good: the average 
number of distance constraint violation >0.3 A is 0.2 
per structure and the largest violation found in any of 
the. 35 structures is 0.38 A. The largest dihedral angle 
constraint violation is 3.2\ 

A plot of average backbone dihedral angles in the 35 
SA structures is shown in Fig. 4<i and plots of dihedral 
angle order parameters are shown in Fig. ^b-d. Average 
backbone dihedrals are all within the allowed regions of 
a Ramachandran diagram (not shown), except for those 
of Lys 8. The backbone t)f this residue is less well de- 
fined* as judged from the angular order parameters, 
which results in a stcrically unfavourable geometric av- 
erage. The superimposed backbones of the final SA struc- 
lures are shown in slereo in Fig. 3t/. The backbone con- 
formation within the |i-sheei regions is well-defined, as 
indicated by atomic backbt)ne riM>t-mean-squaredevia- 



Table 1 Structural Statistics for S5o7d' 



Statistic <5A> (SA) 

R.m.s- deviation from experimental distance (A) 
and dihedral angle (deg) restraints*' 

distance restraints (617) 0.025 ± 0.0018 0.024 

dihedral angle restraints (1 1) 0.26 ± 0.23 0 

No. of violations' 

distance restraints (>0.3 A) 0.20 0 

dihedral angle restraints (>!") 0.31 0 

£^,(kcalmoir -172 ±20 -214 

Deviations from idealized covalent geometry 

bonds (A) 0.0025 ± 0.00016 0.0026 

angles (deg) 0.36 ± 0.01 5 0.36 

impropers (deg) 0.24 ± 0.03 0.22 

' The notation of the NMR structures is as follows: <SA> are the final 35 simulated annealing structures; (5A),^ji» *J 

the mean structure obtained by averaging the coordinates of the individual SA structures best fit to each otn 
followed by minimization by restrained regularization. 

''The number of restraints is given in parentheses. . . 
' The maximum distance violation is 0.38 A and the maximum dihedral angle violation is 3.2" in an individual 
structure. 

''f^ J is the Lennard-Jones van der Waals* energy calculated with the CHARMM*' force field. 



structural biology volume 1 number 1 1 november 199^ 




article 



e|fl. 4 Average 
dihedral 
Jngles (a) and 
angular order 
parameters S*^'' 

angles for all 
residues 
in the 35 SA 
structures. 



a 



m cp 

o op 




I T I I I I I I I I I 

50 55 60 




article 



tions of 0.5+0.1 A compared to the geometric average 

structure ( Table 2 ). Other regions are somewhat less well 
defined, as indicated by an overall backbone r.m.s.d. of 
1,1 ±0.2 A. The side chains of several residues in the hy- 
drophobic core of Sso7d are also well resolved, as can 
been seen in Fig. 5 f?. The C- terminal fragment (residues 
46-60) is somewhat more well defined than the loop re- 
gions, w ith a backbone r.m.s.d. of 0.9±0,2 A, and a short 
a-heli\- including residues 52-39 is clearly discernible. 
This hcli.x can also be deduced from a continuous stretch 
of strong sequential and medium range 

(/ ,^(i,i-r.M and r/^^j(i,i+3) NOE connectivities. 

'\hc final set of SA structures contains several hydro- 
gen bonds, in addition to those used in the structure 
calculations. These involve the backbone amide protons 
and carbonyl oxygens of residues 18 and 15> 19 and 15, 
20 and >:> 25 and 28, 27 and 25, 50 and 46, and 50 and 
47, respectively. 

The Sso7d structure 

Sso7d i> globular protein. The tertiary fold consists of 
a triple->lranded anti-parallel ji-sheet, consisting of resi- 
tlues :i-23, 2S-.V^ and 41-46 (strands 111, IV and V, re- 
spcctivclv ), onu> which a double-stranded (3-sheet, made 
up of i\>Klues 2-7 and 10- 1 5 (strands I and M ), is packed 
in an onlu)g(Mial manner. The hydrophobic core con- 
si.sts of >ide chains at the interlace of the two sheets, in- 
cludini: those indicated in l*ig. 51k Strands I and II are 
connected through a type II reverse turn with a hydro- 
i;en bond between the carbonyl of Tyr 7 and the amide 
ot (;Ui 10. Strand II ends in one complete turn ol an (X- 
helix inv»4ving resitlues 16-19, with a hydrogen bond 
between the carbonyl of Asp 15 anil the amide of lie 19. 
Strands III aiui IV in the .secimd |i-sheel are connected 
bv a tvpe 1 reverse turn involving residues 25-28. Thus, 
ln'droi:cii l^onds between the carbonyl of Val 25 and llie 
amide olMel 2,S,anil the amide of Val 25 and the carbo- 
nyl ol Mel 2S are present in the triple-.stranded |i-sheet, 
in addition l«> those .shown in l-ig. Mk Residues 35-40 
lorm a surface lin>p. ctinlaining the glycine Iripeptide 
( ilv M^-i ;iv 37-( ;iv .>S ( l-ig. 6). riie structure of this loop 
is not very well dellned by the NMR ctmstraints and it is 
clear that it can show a large degree o\' inherent tlexibil- 



Table 2 Atomic r.m.s. difference statistics for the Sso7d structure* 

Comparison Residues Backbone' All heavy atoms 



<SA> vs SA 



5A...V5SA_ 



1- 60 
46-60 

2- 7.10-15,21-25, 
28-34.41-45 

1-60 



1.08±0.17 
0.95±0.22 

0.54±0.09 

0.45 



1.60±0.16 
1.72±0.28 

1.14±0.11 

0.80 



-Notations correspond to those defined in Table 1 with the addition that SA^^ 
is the non-minimized geometric average structure. Residues 61 and 62 are 
excluded from the comparison due to lack of structural constraints in this 
region. 

"Superimposed fragments. 
'Atoms N. C and O/. 
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ity. Strand V (residues 41-46), ends in a complete tum 
of an a-helix involving residues 47-50. This short heli. 
cal segment is anchored through hydrophobic interac- 
tions involving Ala 50 and Pro 51. The backbone of the 
C-terminal fragment is not as well-defined as the ^ 
sheets, but residues 52-59 appear to form two turns of 
a-helix. This short helix is packed against the core 
through hydrophobic interactions between Leu 54 and 
Ala 50. 

The surface of Sso7d contains a hydrophobic cleft and 
several exposed hydrophobic side chains (Fig. 6fl). The 
hydrophobic cleft consists of the N-terminal Ala 1 side 
chain and the isoleucine residues lie 16 and He 19 on 
one\Mde\ and the side chains of Pro 51, Leu 54 and Met 
57 of the C-terminal helix on the other. The Trp 23 and 
Val 25 side chains of strand 111 are completely. exposed 
to the solvent and .so is the methyl of Ala 44. The side 
chains of Tyr 7 and Mel 28 are partially exposed on the 
surface. 

The many basic lysine and arginine side chains are 
rather evenly distributed at the surlace and the positive 
charges seem to be partially compensated for by nearbv 
acidic side chains. However, ihe face of the triple- 
stranded |i-sheet appears lo be predominantly positive 
in charge. This surface also contains the exposed Trp 23 
side chain: the lluorescence t)f this residue is quenched 
by 90% upt)n formatu)n «)f a complex with dsDNA.Thus, 
this face of the pn>lein may be the DNA binding surface. 

Sso7d and eukaryotic SH3 domains 

The topology o! Sso7d is very similar to that of eukary- 
t)tic SI \^ dt>mains ( l ig. 7</). The SI \} domains are small 
protein moduloslaboui hi) residues) which, together with 
SII2 domains, are found in many proteins involved in 
signal transducii*)n in eukary«»te The SH2 and SH3 
domains arc commonly found in kinases or phospholi- 
pa.ses, where they are believed to participate in protein- 
protein interactions. The structures of SH3 domains 
from several proleins have recently In-en solved by both 
NMK spectroscopy and X-ray cryslalh)graphy^*-^^ 

The minimized average structure of Sso7d is com- 
pared with the structures of the SI 13 domains of chicken 
brain ot .spectrin^' (PDI^ entry ISIU^) and human ^-n 
proto-oncogene ' IIM)U entry ISlll-) in Fig. 7a and an 
alignment of the three .seijuences based on secondar>' 
.structure and folding topoU>gy is shown in Fig. 71;. The 
superimpt>sitit>ns included 3S(:a coordinates of the ti« 
p-strands and a fragment from the C terminus in Sso/d 
(residues 1-7, 10-16, 21-25, 2S-33.and -n-53; FigJ^)* 
The r.m.s.ds with corresponding fragments in the a 
spectrin and fyn SH3 domains are in both cases 3.3 A- 
Thus, there is a good quantitative agreement l>«^^^^" 
these structures. Differences are found at the N and ^ 
termini and for surlace loops. In particular, the inter- 
connection between the P-strands of the two SH3 <lo^ 
mains which corresponds to strands IV and V in Sso/ 
is extended into the putative P-loop in Sso7d (Fig. 

Comparison of the complete sequences of Sso7a an 
the SH3 domains does not reveal sequence 
However, honn)logy can be inferred when 
only the fragments for which there is structural sim 
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ity, that is» when excluding loops and N and C termini, 
ahhough any homology is still too weak to be conclu- 
sive by conventional alignment algorithms. Sequence 
identities and sequence similarities (aromatic/hydropho- 
bic residues) in the fragments that were used in the struc- 
tural alignment are shown in Fig, 7h. It is worth noting 
that several residues which are well conserved among 
various SH3 domains*- are present at the corresponding 
positions in Sso7d. These include Val 3 in Sso7d (an ala- 
nine in SH3),Phe 3 and Tyr 7 (aromatics)tLys 12 (lysine), 
Val 22 and Trp 23 ( hydrophobic) » Met 28 and lie 29'(lryp- 
lophan and tryptophan/hydrophobic), Gly 43 (glycine), 
Ala 44 and Val 45 (aromatic or hydrophobic), and Ala 30 
( hydrophobic). S.so7d and SH3 domains are also similar 
in that they expose hydrophobic surfaces-'. 

The possible origin and significance of the structural 



similarity between the Sso7d> which is an abundant pro- 
tein in the archaeon Sulfolohu>. and the SH3 domains, 
which appear to have assimied highly specialized roles 
in signal tran.sduciion in eukaryote, is unclear. One sce- 
nario may he that the fold has survived in all kingdoms 
due to its ( thermal ) stability and because it forms a suit- 
ably small and stable platlorm for different functions in 
various organisms. An SH3-like fold has alsi> recently 
been discovered for a small protein in the photosystem I 
complex (PsaH) in cyanobacieria-*\ Structural similari- 
ties to SH3 has also been noled in another DNA-bind- 
ing protein: the biolin biosynlhetic operon repressor 
(UirA) in Lcolr". 

Methods 

Protein purification. SuKofobus sotfataricus {DSM 1617) isolated 
from volcanic hot springs in Italy" was purchased from the 
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Fig. 5 a. Stereoview of superimposed backbone traces of residues 1-62 in Sso7d. For the sake of 
clarity, only 1 1 of the 35 SA structures are shown. The structures are superimposed to minimize 
r.m.s differences of backbone atoms in residues 1-60. N and C termini are coloured in blue and 
red, respectively. The loop containing the putative phosphate/nucleotide binding site is coloured 
in green, b. Stereoview showing the resolution and packing of hydrophobic side chains in the 
protein core. The structures have in this case been superimposed to minimize r.m.s. deviations 
between heavy atoms of residues constituting the core. 
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Fig. 6 Space-filling model of Sso7d showing exposed hydrophobic 
(yellow) and aromatic (orange) side chains (tyrosine hydroxyls 
are also coloured in orange). The glycines in fragment 36-38 are 
coloured in green. The views in (a) and (fa) are from opposite 
directions. N and C termini are indicated in (a). 



"Deutsche Sammlung von Mikroorganismen" (q^^, 
Braunschweig). Cultivation was performed aerobically at 750^ . '] 
7) with an additional 10 gl ' saccharose in a membrane ferme 
(Bioengineering). The cells were heat-shocked for 90 min at 88=r 
and harvested by centrif ugation. Protein was also purified fr 1 
cells that had not been subjected to heat shock, for comparisi^ 
of the extent of lysine methylation. ^' 
1 00 g cells were lysed In buffer A (10 mM Tris buffer, pH 8.8 wi** 
20 mM NaCl. 10% Glycerol) by passing the cell suspension ihrouc- 
a French press. The lysate was centrif uged to remove cell deb'- 
and dialyzed against the same buffer The cytosolic proteins we-- 
loaded onto a Mono 0 (Pharmacia HR 10/10) column equilibrate- 
with buffer A: Sso7 was found in the flow-through. This fraalc^ 
was concentrated in an Amicon stirred cell and applied in 1 5 p- 
fractions to a Superose 6 column (90 x 1.5 cm) equilibrated wii- 
30 mM Tris/HCland 200 mM NaCI at pH 7.4 . Fractions containir- 
Sso7 were pooled, dialyzed against 50 mM potassium phosphai/ 
50 mM NaCI at pH 6.0. loaded onto a Mono S (Pharmacia HRic 
10) column equilibrated with the same buffer and eluted with a 
linear gradient of buffer B (50 mM potassium phosphate pH 8,i?.* 
NaCI). Sso7d eluted at 25% B in two separate peaks, due to the 
presence of differently methylated species of the protein. 
S5o7d concentrations were measured spectrophotometricaify or. 
a Cary 4E spectrophotometer using an extinction coefficient 
calculated from tyrosine (i'.^„.,„= 1400 M ' cm ') and tryptopha- 
(e,,^,„,„= 5500 M ' cm ') absorpiion- \ 

NMR samples were prepared in 90%: 10% H,0: D^O or IQO^ 
D.O with 20 mM potassium phosphate (pH 5 or 6). 50 mM NaC 
and 0.1% azide. The structure determination is based on data 
recorded on the iollowing four NMR vimples: 2.5 mM protein 5: 
pH 6 containing material from l)oth peaks eluted from the Mono 
S column.; -0.2 mM protein M pH 6 containing material elutif^c 
under peak 2; 1 mM protein ai pH 5 containing : -material elutinc 
under peak 1; and 2 mM proiorn conl.iming both irartionsin D.C 
buffer at pH 6 (non-corrcctcd pH nuMcr reading). The first ano 
last samples contained two (Jistmci NMR species. A combinatior. 
of spectra collected on the vxond arui ihiid samples corresponds 
to the NMR spectrum of sample t . 

Mass spectrometfic analysis. M.isb spectrometry (MS) was 
carried out at Pharmafi<i Biosdrmo C outer. Stockholm, using a 
VG Platform mass spcclrom(Mrr liom I isons Instruments equippeo 
with An picttrospray interim r. Tlu» mobile phase consisted of 
mcthdnol:w,i!cr M: 1) with l%,Ki'ti( .kkI. The raiioe 700 < (M/2? 
< 1700, where M is'thc mass -md / is Iho c.haruc was scanneo 
and calibrated using horse luMri myoqlobin as a calibration 
standard. Uncertainties in moieculiU mass determinations are 
approximately two mass units. 

Equilibrium titrations. The DNA and RNA polynucleotides used 
were purchased from Pharmacia and dissolved in 150 mM NaC; 
and 10 mM Tris/HCl at pH 7.4. Polyniicicotide concentrations were 
determined spectrophotometncally using extinction coefficients 
given by Pharmacia. The deoxynuclcosides ATP and CTP were 
purchased from Boehringer-Mannhenn. 

Equilibrium titrations were earned out at 20T in i?uffer C (100 
mM NaCI. 1 mM MgCl .. 0. 1 mM octaethylene glyccl monododecy- 
ether (C,,E,) and 20 mM Tris/HCl at pH 7.4) and in buffer D (0.5 
mM C,i/and 20 mM Tris/HC! at pH 7.4). for which the pH 
measurements refer to 20"C. Titrations were performed as reverse 
titrations, in which different amounts of DNA/RNA were added at 
constant protein concentration {1 pM in buffer C and 2\iM 
buffer D). Steady-state fluorescence measurements were carnec 
out on a Shimadzu RF-5000 spectrofluorophotometer using the 
methodology and additional titration instrumentation recently 
described elsewhere'". The excitation wavelength was 290 nrn 
and emission intensities were sampled at 0.2 nm intervals withtr' 
the wavelength range 340-355 nm. Emission .pectra were 
recorded five times for each titration point in order to minimize 
effects of instrumental fluctuations. Measured fluorescence 
intensities were corrected for background emission by subtractin? 
(smalt) signals from buffer samples and for optical filtering effects 
due to DNA absorption at 290 nm. 
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The fractional fluorescence quenching (Q^) was calculated as (1^- 
\)A„. where 1^ is the protein fluorescence intensity observed in the 
absence of DNA/RNA and I is the intensity in the presence of DNA/ 
RNA. Binding isotherms are presented as plots of against the 
logarithm of the basepair (dsDNA) or base <ssDNA. ssRNA and 
monodeoxynucleosides) concentration. 

DNA melting studies. Light absorption of poly(dldC) at 260 nm 
was measured as a function of temperature on a GARY 4E 
spectrophotometer, which allows the simultaneous measurement 
of up to six melting curves. The temperature was increased in 
steps of 1*'C during a time period of 30 s. followed by a holding 
time of 60 s prior to absorbance measurements. The denaiuration 
experiments were performed in 5 mM Tris/HCl at pH 7.0 (buffer 
E) with various concentrations of added Sso7d. 

NMR spectroscopy. NMR spectra were recorded on Varian Unity 
500 and 600 NMR spectrometers operating at magnetic fields of 
1 1 .74 and 14.09T, respectively, and equipped with programmable 
pulse modulators and pulsed field gradient hardware. Spectra were 
recorded at 293. 303. 313 and 323 K. 'H chemical shifts at 303 K 
(available from the authors) are referenced to H .O at 4.74 p.p.m.. 
Phase sensitive two-dimensional spectra were recorded in the 
hypercomplex modO'- 



Two-dimensional homonucleai DOF-COSY NOESY'-. and clean- 
TOCSY spectra- ' were recorded using spectral widths of 6.000 
Hz. 2*512 t. increments, 1024 complex data points in the 
acquisition time domain and with 8-32 transients per t, increment. 
NOESY spectra were recorded using cross relaxation mixing limes 
of 60 or 200 ms and clean-TOCSY spectra were recorded using 
isotropic mixing times of 10, 60 or 80 ms. A 2D 'H.'X-HSOC 
spectrum was recorded usmq gradient selection^- with a 'H and 
'^C sweep widths of 6000 H- and 20000 Hz. respectively, 2*128 
t, increments, 512 complex tiaia points and 160 transients per 
increment. The HSQC sequence was optimized for a C-H scalar 
coupling constant o1 140 H:. wuh the 'X transmitter placed at 
57 p.p.m.- 2D SS-NOESY spectra were recorded with a sweep 
width of 8000 Hz and a 200 n-!s mixing time. The third pulse in 
the SS-NOESY sequence is a shotted laminar pulse^^ creating a 
zero net excitation at the Ireauency oi the transmitter (water 
resonance). Water suppressioi' '.vas achieved by presaturation of 
the water signal or presaturation iO combination with SCUBA water 
suppression . No presaturatiof. was used in the HSOC and SS- 
NOESY experiments. 

NMR spectra were processed os;ng software from Varian (VNMR) 
and/or Biosym TechnolouK-, '.Fe;rv 2.2). Data processing typically 
involved apodizaiion with sliiiied Gaussian functions in the t. 
(acquisition time) domain .ino sme/cosine bell functions in t.. and 




Fig. 7 a. Comparison of folding topologies in Sso7d and SH3 donnains. The stereo picture contains 
the superimposed backbones of Sso7d (grey), the SH3 domains of chicken brain a spectrin (green), 
and the human fyn proto-oncogene (blue), b. Secondary structure based alignment of the Sso7d 
sequence to those of the SH3 domains of chicken brain a spectrin (C spec a), and the human fyn 
proto-oncogene (H fyn). Elements of secondary structure in Sso7d are shown at the top. The 
numbering refers to the Sso7d sequence. The grey bars indicate fragments used in the structure- 
based alignment. Orange boxes indicate similar or identical hydrophobic residues within the 
aligned sequences. The blue and green boxes denote a lysine and a glycine which is located at 
identical positions in the aligned sequences. 
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baseline correction using routines available within the two 
software packages. Processed spectra typically contained 
1024x1024 real data points. 

NMR data analysis. Spin system identification and sequential 
resonance assignments of 'H resonances in Sso7d were carried 
out in homonuclear 2D spectra using standard methodologies"* '-. 
The natural abundance --C/'H HSQC spectrum aided significantly 
when sorting out H methyl and aromatic resonances. Most 
assignment work and collection of NOE constraints were carried 
out on spectra recorded at 303 K. Analysis of NMR speara and 
compilation of NOE data were performed using the interactive 
computer graphics program ANSIG^^. 

Stereospecific assignments of prochiral methylene groups were 
carried out by identifying predominant x' rotameric states using 
coupling constants measured in DQF-COSY spectra and 
iniraresidue NOEs measured with a short (60 ms) mixing time*-. 
The relative magnitudes of -7.,,, ,^, and V,^,,^^, coupling constants 
could also be measured in cleah-TOCSY spectra recorded with a 
short (10 ms) mixing time using reported simulations**" as a 
reference for expected cross peak intensities. Valine methyl groups 
were stereospecifically assigned and x' rotamers from the 
magnitude of the V,^,„|, coupling and the relative intensities of 
intraresidue d,,^^„, NOE connectivities". (note that the notations 
of valine yl and y2 methyls in ref . 41 are exchanged compared to 
convention). 

The X" rotameric states of Thr 2 and Thr 32 were estimated as 
follows. Both lesidues have relatively small coupling 
( onslanis and Ihe HN-Ha cross peaks in DQF-COSY are quadratic" . 
iniJic.ninq lhat x-60 or x*=180. Inspection of the short mixing 

lime NOESY spectrum revealed that d d in Thr 2. which 

IS consistent with x'=f80. whereas d, c/„^^„„'in Thr 32. which 
IS u)nsi';icnt with x'=60. 

NOEs were (|uanii(i(?d as distance constraints based on cross peak 
volumes measured m a NOESY spectrum recorded with a mixing 
timf? ol 60 n>s. The conversion of volumes into distances was 
based on (.alihration of observed intraresidue and sequential NOEs 
within well-defined segments ol anti-parallel IJ-sheet'". NOE 
volumes involving HN protons were corrected for the presence of 
10% D O tn the sample. Cross peak volumes involving methyl 
protons wi«ro divided by three prior to conversion into distance 
t.onslrainis. Distance constr.iir^ts w(.»re (iivided into four classes: 
strong (.v2.7 A), medium (<3.3 A), weak (<5.0 A) and very weak 
(<6.0 A). Pseiicloatoms with appropriate distance corrections were 
ifp«»l<»d tor non-slereospoiifitally ^issigncd methylene protons. 
.Honi.iiK nncj protons and the methyl cjroups in leucines'". A 
unduted) psiMiiloatom lorrection ot 0.3 A wds used to account 
lot t'Ifects due* to rapid rotation of mcliiyl groups'". 
A total ol 1-1 liydrogcn l)oiuled amide protons could be identified 
i.Mlher .IS slowly exchanging resonances in a TOCSY spectrum of 
Sso/«l dr^solviMl in D O. or as amide-proton resonances for which 
leinpoiatuie depeivience of the chemical shift is small (< S 
p.p.b.K ■) compaied to that of C-terminal residues which are 
pxoosed to Ihe solvent i> 8 p.p.b.K '). These experimentally 
suppr)iied hydrogen bonds (between backbone amide protons 
and tarbonyl oxygens) were imposed in the structure calculations 
as 28 distance constraints with lower and upper bounds of t .8 A 
and 2.4 A tor amtde hydrogen to carbonyl oxygen distances, and 
2.6 A and 3.4 A for amide nitrogen to carbonyl oxygen distances, 
respectively. The hydrogen bond constraints were imposed at a 
late starve ol the structure refinement at which point hydrogen 
bond donor-acceptor pairs could be unambigously identified. All 
hydrogen bonds used in the calculations are within well-defined 
regions of anti-parallel |5-sheet. A table of sequential assignments 
of ttie 5so7d NMR spectrum at 30''C and pH 6.0 is available 



from the authors on request. 

Structure Calculations. Three-dimensional structures w 

determined using a dynamic simulated annealing (SA) method^ 
implemented within the X PLOR 3.0 program**. The protocol "i 
Nilges ef at** was used with some modifications, as descrih^n 



below. Extended peptide conformations were used as 



structures in the simulations. The X PLOR force field- 



starting 



Tcontaining 



potentials for chemical bonds, repulsive van der Waals' interaction' 
and experimental (distance and dihedral) constraints— was usea^ 
The k, constant of the distance constraint potential vvas set to 5o 
kcal mol-'A-* and the force constant of the dihedral (x') squa'e 
well potential was set to 200 kcal mol ' rad Force constants for 
planarity and chirality were set to 50 kcal mol ' rad ^ Tf^e 
simulations were carried out in five stages: /, 100 steps Powell 
energy minimization to remove bad non-bonded contaas; // 15 
ps of dynamics at 1000 K with normal van der Waals radii arid 6 
low repulsive force constant (0.002 kcal mol ' A-*); ///. IQ ps 
1000 K dynamics during which the repulsive force constant was 
increased to 0.1 kcal mol * A"* and the assymptote in the NOE son 
square well potential (constant c in ref. 44) was increased from 
0.0 to 1 .0 (in 10 steps); iV. cooling to 300K during 5.6 ps (28 steps 
of 0.2 ps with 25K cooling/step) with repulsive force constam of 
4.0 kcal mol' A^ and van der Waals' radii scaled by 0.8; and v 
1200 steps of Powell minimization with normal van der Waals 
radii and force constants for planarity and chirality set to 500 kcai 
mol ' rad •'. A 1 fs time step was used throughout with bonds 
constrained using the SHAKE algorithm during stages /'-/V. 
An ensemble of structures was initially calculated after the 
sequential assignments were almost completed and about 300 
distance constraints had been collected. The simulations were then 
repeated several limes duinu| striu lure refinement. The final round 
of SA contained 50 simul.itions out ol which 35 converged yieldino 
low energy structures. An avor.Kjo SA struct;» c iSAJ was 
calculated from the 35 SA strut itiies hy averaging superimposed 
coordinates. The averaqo structure was also minimized (SA ^^j 
using the same potential as in stage v of the SA protocol.' The 
structures were analyzed with rosf)Of I to the precision of atomic 
positions and dihedral antjles. l onstraini violations, deviations from 
idealized bond (leomelnos and n(Mvl)ondcd interaction potentials, 
and further tharaclcMi/cd with r(»spcct to dihedral angle 
conformations and hydro{}(M» l)oiuliiu). Dihedral angle order 
parameters. S"""", roMt.M lincj the prct ision ol the correspondina 
dihedral within the cnsoinblr w(M(» ( «il( ul.Uod according to Hyberts 
ot a/. "*. A value ot S""'" appio.K hiru| unity indicates a very well- 
defined dihe(fral ,}j\q\o whereas an isotropic dismbution yields 
S'""^*=0 (but S "'=0 MUist not ihm cssanly relied an isotropic 
distribution). The onscM»l)U» ol SA sum lures were also searched 
for additional internu)l(^»ilar hydroqon bonds using the following 
two criteria: the distaiue bctw(»on ihe donor hydrogen and 
acceptor oxygen and the two heavy atoms must be less than 2.5 
A and 3.5 A, respectively. Hydrogen bonds mentioned in the text 
fullil these criteria in at least 18 ot the 35 SA structures. Structural 
r.m.s. differences quoted in lire le.xt refer to comparisons with 
the average structure (SA ). it -.hould be noted that r.m.s. 
difference comparisons contaimnq all atoms' can sometimes be 
erroneous and too large due to the specific atom labv?Hing of phenyi 
and tyrosyl rings and carboxylaie groups. This iS because the 
computer program evaluating r.fn.s. differences does not always 
consider the inherent symmetry of these groups and therefore 
can give a large r.m.s. difference even in the case of perfect overlap 
(R Kraufis. personal communicationi. 
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Gene Cloning, Expression, and Characterization of the Sac7 Proteins from the 
Hyperthermophile Sulfolobus acidocaldarius^ 

James G. McAfee,* Stephen P. Edmondson, Prasun K. Datta,§ John W. Shriver,* and Ramesh Gupta* 
Department of Medical Biochemistry, School of Medicine, Southern Illinois University, Carbondale, Illinois 62901-4413 
Received March 28, 1995; Revised Manuscript Received May 19, 1995^^ 

abstract: The genes for two Sac7 DNA-binding proteins, Sac7d and Sac7e, from the extremely 
thermophilic archaeon Sulfolobus acidocaldarius have been cloned into Escherichia coli and sequenced. 
The sac7d and sac7e open reading frames encode 66 amino acid (7608 Da) and 65 amino acid (7469 Da) 
proteins, respectively. Soutliem blots indicate that these are the only two Sac7 protein genes in 5 
acidocaldarius, each present as a single copy. Sac7a, b. and c proteins appear to be carboxy-terminal 
modified Sac7d species. The transcription initiation and termination regions of the sac7d and sac7e genes 
have been identified along with the promoter elements. Potential ribosome binding sites have been 
identified downstream of the mitiator codons. The sac7d gene has been expressed in E. coli, and various * 
physical properties of the recombinant protein have been compared with those of native Sac7. The UV 
absorbance spectra and extinction coefficients, the fluorescence excitation and emission spectra, the circular 
dichroism, and the two-dimensional double-quantum filtered NMR spectra of the native and recombinant 
species are essentially identical, indicating essentially identical local and global folds. The recombinant 
and native proteins bind and stabilize double-stranded DNA with a site size of 3.5 base pairs and an / 
intrinsic binding constant of 2 x 10^ M"' for polyIdGdC]-poly[dGdC] in 0.01 M KH2PO4 at pH 7.0. The • 
availability of the recombinant protein permits a direct comparison of the thermal stabilities of the 
methylated and unmethylated forms of the protein. Differential scanning calorimetry demonstrates that 
the native protein is exn-emely thermostable and unfolds reversibly at pH 6.0 with a T„. of approximately 
100 X, while the recombinant protein unfolds at 92.7 °C. ... 



Small basic DNA-binding proteins have been isolated from 
various archaea, some of which have been shown to be 
iNSociaied with the nucleoid or chromatin and presumably 
perform a hislone-like or helix-slabilizing function in these 
orjanisms (Searcy. 1975; Stein & Searcy. 1978; Searcy & 
Mange, 1980; Thomm et al., 1982; Grote et al., 1986; Lurz 
«al., 1986; Choli et al., 1988a,b; Reddy & Suryanarayana, 
1989; Sandman et al., 1990), although the actual function 
of many of these proteins has not been demonstrated. HTa 
protein from the thermophilic archaeon Thermoplasma 
acidophilum shows considerable homology to eukaryotic 
Msiones and Escherichia coli HU protein (Searcy, 1975; 
Searcy & Delange, 1980). Hmfl and Hnif2, two DNA 
^ding proteins from Methanothermus fervidus, are also 
homologous to some of the eukaryotic histones (Sandman 
1990). 

Sulfolobus, a thermoacidophilic archaeon. expresses a 
^ber of small basic DNA-binding proteins ranging in 
oolecular weight from 7000 to 10 000 (Kimura et al., 1984; 
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Grote et al., 1986; Choli et al.. 1988a). These have no 
apparent homology to any of the histones. Much of the early 
work on these proteins resulted from a search for chromatin 
proteins that might stabilize the genomic DNA at the high _ 
growth temperanire. Sulfolobus acidocaldarius grows op- 
timally in the range of 70-80 X, while Sulfolobus solfa- 
taricus grows optimally at approximately 75-85 X. The • 
G+C base composition of Sulfolobus DNA is about 409^, 
and its cellular salt concentration is relatively low, making 
a helix-stabilizing protein presumably necessary (Reddy & 
Suryanarayana, 1988). The 7 kDa class of proteins has been 
presented as a likely candidate given that they are present 
in relatively large amounts in the cell (Grote et al., 1986; 
Choli et al., 1988a,b). 

Five proteins have been isolated in the 7 kDa class from 
S. acidocaldarius (Kimura et al., 1984; Choli et al., 1988b). 
and have been labeled Sac7a' through Sac7e. in order of ~ 
increasing basicity. Four of these, Sac7a, b. d. and e, have 
been sequenced (Figure 1) (Kimura et al., l984;'Choli et 
al., 1988b), and only minor differences among them have 
been noted. The sequence of Sac7c has not been reported. 
The number of genes encoding the 7 kDa proteins of S. 
acidocaldarius has not been determined; Comparison of the 

' Abbreviations: DSM. Deutsche Sammlung fiir Mikroorganismen; 
IPTG. isopropyl /?-i>-thiogalactopyranosidc; NMR. nuclear raagneiiQ, 
resonance; COSY, correlation spectroscopy; DQF-COSY, double- 
quantum filtered correlation spectroscopy; DSC, differential scanning 
calorimetry; CD. circular dichroism; Sac7, a group of 7 kDa DNA- 
binding proteins from Sulfolobus acidocaldarius, individually referred 
to as Sac7a. Sac7b. Sac7c, Sac7d. and Sac7e. in order of increasing 
basicity; Sso7. a group of .7 kDa DNA-binding proteins from Sulfolobus 
solfataricus. ... 
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amino acid sequences indicates that there must be at least 
two separate genes coding the 7d and 7e species. The high ^ 
degree of similarity observed in the primaiy sequence of the 
7d and 7e proteins suggests that two genes arose through 
gene duplication. Sac7a and Sac7b are truncated versions 
of the Sac7d protein, most likely resuUing from truncated 
genes, posttranslational processing, or degradation during 
isolation. 

Specific 6-aminomonomethylation of lysines 4 and 6 is 
characteristic of the Sac7a, b, and d proteins, while Sac7e is 
monomethylated at lysines 6, 62, and 63 (residue 4 is an 
arginine in Sac7e) (Kimura et al., 1984; Choli et al., 1988b). 
No lysine methylation has been detected in the C-terminus 
of Sac7a, b. or d, presumably since there are no lysines at 
positions 62 and 63 in these proteins, although Sac7d 
contains lysines at positions 64 and 65. The Sso7d protein 
from S: solfataricus is monomethylated at lysines 4 and 6 
and also al lysines 62, 64, and 65 (Choli et al,, 1988a). The 
role of lysine monomethylation has not been determined but 
is most likely nontrivial given the specificity (there are 12- 
14 lysines in these proteins) and the occurrence in both 5. 
acidocaldarius and S. solfataricus proteins. Baumann et al. 
(1994) have recently shown that an increase in Sso7d 
methylation occurs upon heat shock and indicate that 
methylation may be directly related to protein stability. 
However, methylation may be an incidental response to an 
increase in mediylase activity directed at other processes. 
Kiethylalion may also increase the reversibility of the 
unfolding process rather than changing the stability. A direct 
calorimetric measurement of the unfolding and stability of 
these proteins has not been reported. 

The Sac7 proteins would appear to be ideal models for 
studies of protein folding and stability given their small size^ 
the absence of cysteine, and expected high thermostability. 
Biophysical analyses of these proteins is hampered, however, 
, by the inability to selectively isolate a homogenous isoform 
in large quantities. The differential methylation of individual 
7 kDa proteins could further complicate quantitative studies 
of structure and stability as well as DNA binding. Therefore, 
we have cloned and expressed the gene encoding the Sac7d 
species in E. coli to facilitate elucidation of the solution 
structure of the protein by NMR with high resolution, probing 
of the thermostability and DN A-binding properties of the 
protein by site-directed mutagenesis, and determination of 
the role of methylation. The availability of recombinant 
protein allows for. a direct comparison of the stability of the 
methylated and uiimethylated proteins. In the process of 
cloning the sac7d gene, the gene for Sac7e has also been 
cloned and sequenced; and we have delineated the transcript 
tion initiation and termination regions of the sac7d and sdcJe 
genes along with the promoter elements. 

An initial structure of the native Sso7d protein has been 
recently published by Baumann et al. (1994), and a high- 
resolution structure of the homogeneous, recombinant Sac7d 
protein has been completed (Edmondson, Qiu, and Shriver, 
manuscript submitted). There are significant differences 
between these structures, and it remains to be determined if 
these can be attributed to sequence differences, lysine 
methylation, or quality of data due to heterogeneity in the 
native preparation. The spectroscopic, DNA binding,- and 
calorimetric comparisons of the native and recombinant Sac7 
proteins reported here indicate little difference in structure, 
but significant difference in thermostability. 
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MATERi. .iS AND METHODS 

Strains of Microorganisms. E. coli strain DHSoFlQ [F 
/flc/^ZAM15/A {lacTYA-argF) recAl hsdR\7{jY' mk+)] was 
purchased from Gibco BRL. £. coli strains HMS174 (p 
' recA r""ki2 m'^ki2 Rif). BL21 (F" ompT r"^ m's), and their 
derivatives were generous gifts from F. William Snidicr 
(Studier el al., 1990). E. coli strain 0236 {duC ung') was 
obtained from Jack Parker (Southern Illinois University, 
Carbondale, IL). S. solfataricus P2 and S. acidocaldarius 
DG6 were gifts from Dennis Grogan (Grogan, 1989, 1991). 
S. acidocaldarius (DSM 639) and S. solfataricus PI (DSM 
5354) were purchased from Deutsche Sammlung fur Mik- 
roorganismen (DSM). 

The Sulfolobus strain used here was received from W. 
Zillig (originally called S. solfataricus PI). We have isolatcc 
a single colony of our organism on solid medium (Grogan 
1989) and have compared the HindlU, £coRl, and Sal. 
restriction fragment patterns of its genomic DNA with tv»'c 
strains of S. acidocaldarius (DG6 and DSM639) and tw( 
strains of S. solfataricus (DSM5354 and P2) according u 
Grogan (1989). In each case the restriction pattern of ou 
organism is identical to the S. acidocaldarius strains and i 
distinctly different from the S. solfataricus strains. This ha 
been further substantiated by Southern analysis of genomi 
DNA using Sac7 protein gene specific oligonucleotides (se 
Results). We have designated our laboratory strain as S 
acidocaldarius RGJM. There has been confusion in th 
literature regarding the identity of the strains of tw< 
Sulfolobus species used in various laboratories at differet 
times. Zillig (1993) has recently addressed this issue an 
tried to clarif>' the confusion. 

Growth of Microorganisms. E, coli strains were grow 
in Luria Bertani media (1% bactbtryptone/1% NaClA).5' 
yeast extract) by standard methods (Sambrook et al., 1989 
SmaU scale cultures of Sulfolobus (10-200 mL) were grou 
in Brock's mediiim (Brock et al., 1972) at 75 °C, suppl< 
mented with 0.2% sucrose. Large scale Sulfolobus cultim 
were grown either in 10 L polypropylene carboy at 78 to \ 
X or in a 16 L VirTis glass fermentor at 70-72 X wi 
vigorous aeration using DeRosa's medium (DeRosa 
Gambacoria, 1975) supplemented with 0.1%. glucose ai 
0.1% glutamic acid. „ ' ' . 

Enzymes and Chemicals. Restriction enzymes, alkalL 
phosphatase, T4 DNA ligase, T4 DNA polymerase and 1 
' polynucleotide kinase were purchased from New Englaj 
Biolabs, Brisco Ltd., BRL, or United States Biochemical C 
[^2p]H3P04 and 5'-[a-^^S]adenosine thiotriphosphaie t 
ethylammonium salt were purchased from ICN Biochemic; 
Inc. and Amersham Co., respectively. Sequenase versi 
2.0 DNA sequencing kit was obtained from United Su 
Biochemical Co. Specific deoxyoligonucleotides were ps 
chased from Research Genetics. The list of the olij 
nucleotides used in this work is presented in Table 1. Dii 
bacterial media were purchased from Fisher Scienlii 
CM52 was obtained from Whatman and Sephacryl S-l< 
HR from Sigma C^iemical Co. All other chemicals w 
reagent grade and obtained primarily from Fisher Scienti 
J. T. Baker Co., and Sigma Chemical Co. Laboratory wi 
was routinely purified to 18.3 MQ resistance with a recycl 
Bamstead Nanopure system. 

Genomic DNA Isolation. Cells from ID- 20 ml culiu 
were pelleted and resuspended in 0.2— 0.3. mL of 10 r 
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Tris-HCI, pH 8.0/1 mM EDTA/1% SDS. This solution was 
:\iracted once each with equal volumes of phenol, phenol/ 
diloToform^soamylalcohol (25:24:1), and chlorofonmAsoamyl 
ilcohol (24:1). Sodium acetate (3 M, pH 5.2) was added to 
final aqueous phase to a concentration of 0.3 M, followed 
by DNA precipitation with three volumes of ice-cold ethanol. 
The DNA was spooled onto a thin glass rod, washed in 70% 
cihanol, and air dried. The DNA was dissolved in 10 mM 
Tris-HCI, pH 8.0/1 mM EDTA. • . 

Cloning, Hybridization, and Sequencing, The preparation 
of a Fstl genomic library of S. acidocaldarius RGJM in E. 
coll strain DHSoKlQ and screening of the library by fcolony 
h\tridization was according to published procedures (Berger 
k Kimmel, 1987; Sambrook et al., 1989). Southern and dot 
Mot hybridizations were carried out using nitrocellulose 
membranes according to the manufacturer's protocols 
(Schleicher & Schuell) which are based on the method of 
Southern (1975). The preparation of {y -^^pjATP and 5' ^^p. 
end-labeling of oligonucleotides was by standard methods 
(Johnson & Walseth, 1979; Gupta, 1984; Sambrook et al.,' 
1989). DNA was sequenced by the dideoxy chain termina- 
tion method (Sanger et al., 1977) using a Sequenase version 
10 kit The final sequences were determined from both 
strands. The standard universal primers for Stratagene*s 
pBluescript vectors (Short et al., 1988) and specifically 
s}7ithesized oligonucleotides were used in sequencing reac- 
tions. DNA sequences were analyzed using the computer 
program DNA Inspector De (Textco Co.). 

Primer Extension. Total RNA from S. acidocaldarius 
RGJM was isolated by previously published procedures 
(Emory & Belasco, 1990). The primer extension assay was 
conducted as described in the Promega "Protocol and 
Applications'' manual. 

Oligonucleotide-Directed Mutagenesis, Procedures for the 
oligonucleotide directed mutagenesis were those outlined in 
ihefiio-Rad Muta-Gene manual and are based on KunkeFs 
method (Kunkel et al., 1987) using £. coli dufung" strains. 
We were unable to propagate the substrate for oligonucleo- 
tide directed mutagenesis, pBluescript KS-\-/sac7d (see 
Results for the description and nomenclature of the plasmids), 
in£. coli strain CJ236 (Jwr ung ). Therefore, we used 
DH5aFlQ as the host cell for the production of single- 
branded template and as the recipient for transformation with 
muiagenized plasmid and modified the procedure for the 
selection of mutant plaismid. Colonies arising from trans- 
fonnation with the plasrhids from the mutagenesis reaction 
to create the Ndel site were pooled and grown as a mixed 
wlture. Plasmids isolated from these cells were digested 
*"ilh Ndel and separated on a 0.8% agarose gel. Linear 
plasmids were isolated from the gel, recircularized, and again 
wed to transform DH5aFlQ. Plasmids were then extracted 
from individual colonies and screened for the presence of 
w Ndel restriction site by digestion with the enzyme. Final 
confirmation of the desired mutation in the plasmids was 
o^>J^ed by sequencing. " . / 

Gene Expression, For gene expression, pET-3b/^flc7J was 
transformed into E. coli strain BL21 (DE3) pLysS (Snjdier 
« al., 1990). For protein isolation, a 10 mL culture of this 
''^formant was grown overnight in LB broth containing 
^cillin (200 figJftiL) and chloramphenicol (27 //g/mL). / 
™m this, 0.6-1 mL was used to inoculate 50 mL of fresh 
o>edium. At an Aeoo of 0.3-0.6, 25 mL of the culture was 
<wuied into 1 L of new medium. The culnire was induced 



upon reaching an i46oo of 0.8-0.95 by adding IPTG to a final 
concentration of 0.4 mM. A small aliquot of each culture 
was taken prior to induction to assay for expression and 
plasmid stability as described by Studier et al. (1990). 
Culnjres were harvested at 1 h postinduction and stored at 
-70X. 

Protein Isolation and Purification. E. coli ceDs containing 
recombinant protein were thawed slowly and resuspended 
in 100 mL of 10 mM Tris-HQ, pH 7^/0.5 mM phenyl- 
methanesulfonyl fluoride, and the cells were lysed by 
repeated freezing and thawing along with brief sonication 
on ice. To isolate native protein, Sulfolobus cells were 
suspended in 0.05 M KH2PO4 buffer (pH 6.8) and lysed by 
sonication on ice. DNase I (20 mg/l(K) mL) was added to 
lysed ceDs, and the suspension was incubated at 37 X for 5 
min followed by centrifugalion at 2800(K)^ for 60 min. The 
supernatant was cooled on ice and dialyzed in SpectraPor 
CE 1000 MWCO tubing against 0.2 M H2SO4 overnight at 
4 The resulting precipitate was removed by centrifuga- 
lion at 180000^ for 30 min, and the supernatant was dialyzed 
four times against 20 mM Tris-HQ, pH 7.4/1 mM EDTA. 
A small amount of precipitate was removed by centrifugatiori, 
and the supernatant was applied to a CM-52 ion exchange 
column equilibrated with 20 mM Tris-HCI (pH 7.4), The 
protein was eluted with a linear NaCl gradient (0.0-0.3 M) 
with both the native and recombinant Sac7 proteins giving 
a primary peak at approximately 0.2 M NaCL Further 
purification was accomplished by gel exclusion chromatog- 
raphy on Sephacryl S-1 00-HR in 0.02 M Tris-HQ (pH 7.4). 

The identity and purity of the 7 kDa proteins were 
monitored by nonreducing SDS gel electrophoresis (Schagger 
& von Jagow, 1987). The recombinant protein showed a 
single band that comigrated with the mixture of Sac7 native 
proteins isolated from S. acidocaldarius (Figure 2) and was 
absent in preparations from control £. coli cells lacking the ' 
recombinant plasmid (data not shown). The Sso7 proteins 
run slightly ahead of .Sac7 proteins, consistent with a , 
molecular weight of 7020 (calculated from the sequencc^^^ 
The Schagger-von Jagow gel used here did not resolve the 
individual Sac7 and Sso7 native species. The identity of 
the recombinant Sac7d protein was confirmed by comparison 
of the double-quantum filtered COSY spectja of native Sac7 
and recombinant Sac7d proteins (see below) and by the 
consistency of the sequence specific 'H NMR assignments 
with.the expected sequence (Edmondson, Qiu, and Shriver, 
submined). * 

In earlier studies the recombinant protein was isolated by . 
a different procedure (McAfee, 1993). E. coli cells were 
lysed and DNase treated as above but without sonication. 
The pH of the supernatant was adjusted to 1 .5 with 5 M 
H2SO4. After 45 min on ice and centrifiigation, the 
supernatant was neutralized with 10 N NaOH. The mixture 
was incubated in a water bath at 70 °C for 2 h, followed by 
centrifugalion. The supernatant was dialyzed three times 
with 1 mM NaH2P04 buffer (pH 7.0) followed by CM-52 
chromatography as above. 

Molecular Weight Determirmtion. Approximate molecular"^ 
weights of the native and recombinant Sac7 proteins were 
determined by gel exclusion chromatography on Sephacryl 
S-1 00-HR. Cytochrome c, myoglobin, carbonic anhydrase, 
and bovine serum albumin were used as moleciilar weight 
standards, and blue dextran and DNP-alanine were .used to 
measure the column void and total volumes, respectively. 
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The molecular weights were determined as described by 
Mayes (1984). ^. ■.. : . ... - 

Phosphorylation and Glycosylation Assays, Phosphate 
analysis was performed by the method of Rske and Sub- 
barow (Fiske & Subbarow. 1925; Leloir & Cardini, 1957). 
Small aliquots of Sac7 (0.95 mL of a 0.5 mg/mL solution 
in 0.02 M Tris-HCl, pH 7.0) were incubated at 37 ^^C f or 1 
h with 0.05 mL of bovine intestinal alkaline phosphatase 
(Z5 mg/mL in 0.01 M Tris-HQ, pH 9.8). The protein was 
precipitated with 0.10 mL of concentrated perchloric acid, 
incubated on ice for 10 min, and centrifuged for 5 min at 
13 000 ipm. To 0.90 mL of supernatant was added 2.0 mL 
of distilled water, 1.0 mL of 5 N H2SO4, 1.0 mL of 2.5% 
ammonium molybdate, and 0.10 mL of reducing agents 
[prepared fresh by dissolving 0.25 g of reducing mixture^ 
(sodium bisulfite, sodium sulfite, and 1 -amino-2-naphthol- 
4-sulfonic acid in a 46:46:8 ratio) in 10 mL of water]. The 
solutions were allowed to stand for .20 min, and the 
absorbance was measured at 660 nm. A standard curve was 
prepared using known amounts of a 0.01 M KH2PO4 solution. 
O-Phosphoserine, treated with alkaline phosphatase as 
described for Sac7 gave quantitative recovery of phosphate. 

The phenol— sulfuric acid reaction was used to assay 
carbohydrate content of Sac7 protein (Debois el al., 1956; 
Hirs, 1967). To 1.0 mL aliquots of Sac7 protein solution 
(0,3 mg/mL) was added 0.25 mL of 80% phenol and 2.5 
mL of concentrated sulfuric acid. After mixing, the solutions 
were left at room temperature for 10 min and then placed in 
a 25 **C water bath for 20 min. The absorbance was 
measured at 489 nm. Known amounts of a-D-glucose were 
used to construct a standard curve. . 

Protein Extinction Coefficient. Ultraviolet and visible 
spectra were recorded on a Cary 210 spectrophotometer at 
25 X. The wavelength accuracy was checked using benzene 
vapor and found to be accurate to within ±0.3 nm, and the 
absorbsance accuracy was checked using potassium chromate 
in 0.05 M KOH (Gordon & Ford, 1972) and found to be 
accurate to within 1%. 

The extinction coefficients of both the native Sac7 and 
recombinant Sac7d proteins were determined by measuring 
the amino acid concentration using the ninhydrin reaction 
(Moore & Stein, 1954) for a sample of known absorbance. 
A standard curve was prepared using amino acid standard 
H (Pierce Biochemicals) and converted into leucine molar 
equivalents. The concentration of amino acid standards was 
checked using tyrosine with an extinction coefGcient of ^274 j 
= 1340 in 0.1 M HCl. The molar concentration of amino 
acid residues in the samples was calculated by dividing 
leucine equivalents by the average color yield based on the 
amino add composition (Moore & Stein, 1954). The average 
color yields for Sac7d, lysozyme, and RNase A were 1.0, 
1.05, and 1.06, respectively. The extinction coefficients of 
lysozyme and RNase A standards were checked by this 
procedure and found to be within 1% of published values. 
The procedure gave an extinction coefficient pf 1.03 ± 0.05 
mL/(mg\:m) for both native and recombinant proteins. 

The extinction coefficients were also determined by the 
method of van lersel et al; (1985) immediately following 
chromatography of the proteins on Sephadex G-50 in 0.01 
M NaH2P04 buffer (pH 6.5); .A flat (i0.0005 absorbance 
units) spectrophotometer baseline was programmed using the 
saime buffer which had been used to equilibrate thexolumn. 
Protein spectra were collected on samples directly firom the 
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gel exclusion colunm, generally using only those sai 
with an absorbance less than 2.0 at 205 nm to minimi: 
effects of stray light. The reproducibility of the A2t 
ratio using different aliquots collected through the p. 
peak as it eluted from the column was found to be c 
order of 99%. The linear relationship between the extu 
coefficient at 280 nm and the ratio of the absorbance ; 
and 205 nm was confirmed in our hands using b 
a-chymotiypsin (Worthington), hen egg white lysc 
(Sigma), bovine pancreatic ribonuclease A (Sigma), j 
(Sigma), /3-lactoglobulin (Sigma), and bovine serum all 
(Sigma). A linear fit of the standards yielded a sta 
curve such that 

: €°ii,* = 35.76^ -0.04 

205 - 

with a correlation coefficient of 0.999 and a sta 
deviation for the slope of 0.62 and 0.03 for the y inte 
The extinction coefficients for the native and recoml 
protein were found to be identical with this technique t 
mL/(mg^m) with a standard deviation of 0.008 mL/(m. 

The extinction coefficients were also calculated to b 
mL/(mgx:m) in 6 M guanidine hydrochloride, based * 
amino acid content of the protein using the procedi 
Edelhoch (Edelhoch, 1967; GiU & von Hippel, 
assuming ^xyi = 1280 M"^ cm"\ €jrp = 5690 M~^ cr 
6 M guanidine hydrochloride. An increase in absor 
of 3.5% was noted upon denaturation of the protein \ 
M GdnHCl, so the calculated extinction coefficient • 
folded protein was corrected to 1.05 mL/(mg-cm). 
estimated error was taken to be ±0.04 with a maxima 
of ±0.15 (GiU & von Hippel, 1989). 

Circular Dichroism. Circular dichroism spectra of pi 
native Sac7 and recombinant Sac7d proteins were me: 
at room temperature in a 0.01 cm path length cylii 
ceU oti an AVTV 62DS spectropolarimeter. CD dau 
collected at 1 nm intervals using averaging timeis of 1 
s/nm, depending on the signal-to-noise ratio. Relativel 
signal-to-noise ratios made signal averaging pf multiple 
unnecessary. The spectral bandwidth was 1:5 nm. Ba* 
were measured using water and subtracted from the s 
CD. Sample concentrations ranged from 0.2 to 0.7 m 
Protein concentrations were determined from UV absc 
spectra measured in 1 cm cuvettes. The molar C 
peptide bond was determined using standard proo 
(Johnson, 1984) along with the UV extinction coef 
determined above. CD spectra were smoothed as des 
by Savitsky and Golay (1964). The CD was calibn 
290.5 nm with J-camphoir-lO-sulfonic acid using A( 
2.36, and the ratio A€ 192^/^^2905 was —2.10 (Chen & 
1977). _ 

. The fractions of protein secondary structures were 
mined by fitting the CD spectra from 260 to 184 ni 
nm intervals using the variable selection method of J( 
(Manavalan & Johnson, 1987). The results reported : 
averages plus or minus one standard deviation of all p 
combinations of 22 reference proteins lakert"19 at . 
that (I) have secondary structure components greau 
—0.05, (2) have sums of secondary structures betwe 
and 1.1, and <3) have an ims error between measun 
calculated CD spectra less than 0.21 -Ae units. The n 
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of fits meeting this selection criteria were greater than 250 
for native and recombinant protein. 

Nuclear Magnetic Resonance. NMR spectra were col- 
lected on a Varian 500 MHz NMR spectrometer with the 
magnet installed on a TMC Micro-g triangular antivibration 
table. All data were collected at 35 in 90% H2O/10% 
D:0, pH 4.1, with a protein concentration of approximately 

10 mM. The pH was adjusted with DCl and NaOD using a 
Radiometer glass electrode and was not corrected for the 
deuterium isotope effect (Bundi & Wiithrich, 1979). The 
chemical shifts are referenced to the water resonance at 4.73 . 
ppm at 35 X [measured relative to sodium 4,4-dimethyl- 
4 silapentane sulfonate (DSS) in a separate experiment . 
without protein]. . . - . : 

Phase-sensitive double-quantum filtered COSY (DQF- 
COSY) spectra were collected using standard procedures 
(Ranee et al., 1983). Typically, 1024 data points were 
collected in the domain with 512 increments in the ti 
domain, each the sum of 32 scans with a 3 s relaxation delay. 
The spectral widths in both dimensions was 6000 Hz. The 
u-aier peak was diminished in all experiments by presatu- 
raiion during the relaxation delay. Both carrier and decoupler 
frequencies were set equal to the water resonance frequency 
in al! exi)eriments (Zuiderweg et al., 1986). 

The NMR data' were transferred to a Silicon Graphics 
workstation for Fourier transformation and further data 
manipulation using FELIX 2.1 (BioSym). The data were 
rcro-filled to 2048 data points in both dimensions and treated 
«ith a Lorentzian to Gaussian apodization function prior to 
Fourier transformation. 

Differential Scanning Calorimetry. Differential scaiming 
calorimetry was performed with a Microcal MC2 calorimeter. 
Temperature calibration was monitored using sealed samples 
supplied by Microcal. Heat flow accuracy was periodically 
monitored by applying pulses of known magnitude using the 
iniemal heater. In addition, ribonuclease A (Sigma, R525()) 
*as used as a benchmark test protein and shown to unfold 

11 pH 2.2 [0.1 M KCl, 0.02 M glycine, €280 = 0.69 mL/ 
^mg^:m), MW 13 700] with a of 36.0 X, a A//ad of 74.1 
Ical/mol, and a A//vh of 74.8 kcal/mol (AH^AHyt, ratio of 
1.00 ± 0.01), in good agreement with the published values 
ofTiktopuIo and Privalov (1974). ^ . V 

Protein solutions were exhausrively dialyzed against the 
indicaied buffer overnight The sample cell was loaded with 
1^9 mL of protein solution, and the reference cell was filled 
*ilh the last dialysis buffer. Approximately 30 psi of 
nitrogen was applied to the cells during each scan to 
^^iinimize degassing during heating. Samples were not 
degassed, but, instead, the sample was heated repetitively 
tree limes in the DSC instrument by scanning to 35 **C (i.e., 
below any denaturation endotherm), followed by rapid 
^ling. This procedure resulted in the flattest and most 
'H'roducible instrumental baselines. ^ 

All DSC experiments were under computer control using 
* IBM PC computer interfaced to the Microcal MC2 
'^^''^eni. A scan rate of 1 deg/min was used in all 
^riments. The computer interface and data collection 
•^are were supplied by Microcal. Multiple, ref>etitive 
were performed on the same sample to check for 
^crsibility, with identical cooling and equilibration times 
"^een scans. - 



The DSC raw data, in the form of heat flow (mcal/min) 
as a function of temperature, was transferred to a Macintosh 
Qjadra computer for analysis. The raw data were converted 
to excess heat capacity (kcal/deg-mol) by dividing each data 
point by the scan rate and the concentration of protein in 
the sample ceU. All baselines were corrected by subtraction 
of DSC scans of the buffer against which the protein had 
been dialyzed. The heat capacity data was fit by using in- 
house nonlinear least-squares fitting routines to obtain the 
midpoint temperature of the transition and both the calori- 
metric and van*t HofT enthalpies. The basis of the programs 
has been described elsewhere (Shriver & Kamath, 1990). 

Fluorescence, Fluorescence titration measurerhents were 
performed on an SLM 8000C spectrofluorimeter with 4 nm 
excitation and 8 rmfi emission slit widths. Binding titrations 
were performed with excitation at 295 nm and emission 
monitored at 350 nm. Reverse titrations were performed by 
adding aliquots of concentrated nucleotide solutions to a 
known concentration of protein in a 4 mL fluorescence quartz 
cell with stirring using a magnetic **flea" within the cell. 
Nucleic acid concentrations were determined spectropho- 
tometrically using an extinction coefficient of 8400 IV(cnnnol) 
for poly[dGdC]-poly[dGdC] (Wells, 1970) and 6600 
L/(cmTnol) for poly [dAdTJ-polyldAdT] (Inman, 1962). All 
experiments were performed at 25 **C. The fluorescence 
intensity was constant at high DNA concentrations, and thus 
no correction was made for the irmer filter effect Appar- 
ently, any decrease m fluorescence due to the iimer filter 
effect was balanced by other effects, such as scattering by 
the DNA-protein complexes. Photobleaching was not ob- 
served during the titrations. Binding parameters were 
obtained by using a simple, noncooperative McGhee— von 
Hippel model (McGhee & von Hippel, 1974). 

DNA Stabilization. Thermal denaturation studies of DNA 
and DNA— protein complexes were performed on a Cary 210 
spectrophotometer equipped with water-jacketed cuvette 
holders and a circulating water bath cahbrated to within ±0.3 '' 
^C. Melting curves are scaled to an A262 of 1 .0 at 20 ""C for 

the DNA component of DNA-protein mixtures. 

• .. ■ . ' / 

Sequence Analysis, BLAST (Altshul et al., 1990) search- 
ing and alignment were performed using the NCBI server 
(blast@ncbi.nlm.nih.gov) against the **nr" (nonredundant) 
sequence database (including Brookhaven Protein Data Bank, 
January 1994 release; SWISS-PROT Release 29.0, June 
1994; PIR Release 41.0. June 30, 1994; CDS Translations 
from GenBank Release 83.0, June 15, 1994, Kabat Sequences ' 
of Proteins of Immunological Interest Release 5.0, August, 
1992; TFD Transcription f^actor Database Release 7.6, June 
1993). BLITZ and FASTA searches of the latest SWISS- 
PROT database were performed using the EMBL servers 
(blitz@embl-heidelberg.de and fasta@embl-heidelberg.de). 
Database retrieval was performed using the GDB/Accessor 
(Johns Hopkins University) available from frp.gdb.org. 
MacPattem (Fuchs, 1991) (fuchs@embl-heidelberg.de) was 
utilized for BLOCKS (Henikofl" & Henikoff, 1991) and 
PROSITE (Bairoch, 1992) analysis on a Quadra 700' 
(BLOCKS database Version 7.01 was utilized with 2679 
entries and PROSITE database version 12.0, June 1994, was 
used with 1021 entries, both obtained from the /NCBI ftp 
site ncbi.nlm.nih.gov.) The MacVector software package 
. (TBI) was utilized for protein secondary structure analysis. 
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Table 1: List of Oligonucleotides - - - 

oligo- . .. ..." 

nucleotide^ ' sequence* ' " ■ - position' 

A 
B 

C 

. D 
E 
F 

J..G 


* Oligonucleotides A, B» and C were derived from amino acids 9—14, 
5— 11, and 31—38, respectively, of the Sac7 proteins (Rgure 1). These 
amino acid sequences are identical in the four Sac7 proteins. * N = A, 
G. C, or T; Y = C or T; R = A or G. ' Nucleotide positions correspond 
to those in Figure 3. Sequences of oligonucleotides A, C. D, E, F, 
and G are complementary to the sequences shown in Figure 3^- 
Oligonucleotides D and E correspond to the same positions (Figure 3) 
for sac7d and sac7e^ respectively. ^ Oligonucleotides B and C have 
six and four additional nucleotides, respectively, at the 5' termini which 
are not derived from the amino acid sequence of the protein. ' Sequence 
of the primer used for oligonucleotide directed mutagenesis. The 
underiined G replaces a T in the sac7d gene sequence creating an Ndel 
restriction site. , • . • • 



RESULTS 

Gene Cloning and Sequence. Pstl digested genomic DNA 
of S. acidocaldarius RGJM was shotgun cloned in the vector 
pUC19 and transfonned into £. coli, DHSoFlQ. Ap- 
proximately 10 000 transformants were screened by colony 
hybridization to a mixed oligonucleotide probe (oligo- 
nucleotide A, Table 1) derived from residues 9-14 of the 
published amino acid sequence of the S. acidocaldarius 7 
kDa proteins (Kimura et al., 1984; Choli et al., 1988a). [The 
published amino acid sequences for Sac7a, b, d, and e are 
identical over this range (Figure 1) as well as over the ranges 
for oligonucleotides B and CJ Tentative positive clones 
were restreaked onto selective media and screened a second 
time with the same probe. Plasmids isolated from a number 
of these positive clones were then independently hybridized 
t o three different mixed probes (ol igonucleotide s A, B^jand 
C, Tab le f) by dot blot hybridization. Two clones were 
isolated which hybridized to all three probes. Plasmids 
isolated from these cells were partially sequenced using 
oligonucleotide B as a primer. One of the genes cor- 
responded with the published protein sequence for the 
carboxy-terminfd half of the Sac7d protein of S. acidocal- 
darius (Kimura et al., 1984; Choli et al., 1988a) with the 
exception of one additional lysine at the carboxy terminus, 
and the other corresponded to the Sac7e sequence. The genes 
which matched the Sac7d and 7e proteins have been 
designated sac7d and sac7€^ respectively. 

Agarose gel analysis of the plasmids carrying the sac7d 
(p\]Cl9/sac7d) and sac7e (p\]Cl9/sac7e) genes indicated 
that the cloned Pstl fragments were greater than 15 kb in 
size. Southern blot hybridizations of ohgonucleotide C to 
the restriction digests of p\JCl9/sac7d indicated that sac7d 
gene was present on a slightly less than 800 bp EcoRl 
fragment Preliminary sequencing of pUC19/5flc7^/ using 
oligonucleotide B as a primer indicated the presence of an 
EcoRl site 61 bases downstream of the termination codon 
of the protein. Since the published sequence of Sac7d protein 
consists of 64 amino acids (Kimura et al.. 1984; Choli et 
al., 1988a), the second £coRI site was expected to be 
upstream of the start codon.. Thus, the £coRl fragment 
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hybn .Jng to probe C was expected to contain the 
coding region of the gene. This EcoKl fragmen 
subcloned in the vector pBluescript KS+ to produce \ 
script KS+/sac7d^ and the sequence of sac7d gen* 
detennined (Figure 3). The sequence of the sac7e 
(Rgure 3) was obtained directiy from the p\JCl9/sac7e 
primers complementary to the coding region of the f 

The GenBank accession numbers for the sac7d and 
gene sequences reported here are M87569 and LC 
respectively. .. . . . > 

Sequence Analysis and Gene Copy Number. The s 
transcription for both sac7d and sac7e genes was deter, 
using primer extension analysis (Figure 4). Specific p: 
(oligonucleotides D and E, Table 1) that were complem 
to residues 398—418 (Figure 3) of the two genes were 
A single start site was observed for each of the two 
which occurs on a guanosine residue eight nucle 
upstream from the . initiation codon. These guai 
residues are present within perfect archaeal "B box' 

sensus sequences (consensus j'^G— (Zillig et al., 191 

sequence resembling the archaeal "A-box" motif (con* 


TTTA— A) is seen 24 and 23 nucleotides upstream fro 
transcnption start site for the sac7d and sqc7e ; 
respectively (Figure 3). The "A-box" of sac7d has 
base match with the consensus sequence, while that f 
sac7e has only four matches. 

Oligonucleotide F (Table 1) was used to probe gei 
blots of three S. acidocaldarius (RGJM, IXj6, and DS^ 
and two S. solfataricus (DSM5354 and P2) strains (I 
5A). Oligonucleotide F is complementary to a region c 
for residues 34—40 (Figure 1) which are identical for : 
S. acidocaldarius 1 kDa proteins (DDNGKTG) and si 
cantly different from that of S. solfataricus (DEGGG 
two substitutions and an insertion). Two //iVidlD resu 
fragments ('^3.0 and ^^4.6 kb) were recognized by the 
in all three S: acidocaldarius strains, while no hybridi: 
to the 5. solfataricus strains was observed. This obser 
reinforces the assignment of the RGJM strain (our. labo 
strain) as an 5. acidocaldarius strain. The results in 
that the putative genes encoding all of the Sac7 protei: 
present on the two HindHl restriction fragments of '^3 
~4.6 kb in size. Genomic blots of £coRI, HindUl, an 
digested S. acidocaldarius RGJM DNA were also p 
with the common oligonucleotide F (Figure 5B), and ii 
case hybridization to two bands was observed. One 
in each hybridized to oligoniicleotide H, specific f 
imtranscribed region upstream of the sac7d gene (Figur< 
Results of the hybridizations of various restriction d 
of the original p\JC/sac7d and pVC/sac7e clones i 
propriate ohgonucleotides (data not shown) corroborat- 
results in Figure 5 and also indicated thai the original ( 
had a single copy of a sac7 gene. The 3.0 and 4.6 kb H 
fragments can be correlated with the sac7d and sac7e \ 
respectively. The data indicate that there are only tw( 
genes in S. acidocaldarius genome, each being preser 
single copy. This reinforces the conclusion that Sac7 
Sac7b are proteolytically truncated versions of the ; 
protein. J -v. _ .* ' ■ 

i. Protein Sequence Analysis. The sac7d open reading 
can encode a 66 amino acid protein, with a calci 
molecular weight of. 7608. and ihc 5flc7£ encodes a 65 : 
acid protein with a calculated molecular weight of 



NACYTCYTTYTCYTCNCC 230-247 

GGGAGCTTYAARTAYAARGGNGARGA' 218-237 

GGGGTACCRTTRTCRTCRTANGTRAA^ 296-317 

TCTTAACAAATTATTTTATTr ■ . 398-418 

GCCCTTTATACCTTCCCCTTA : . ^ 398-418 

CCTGTCTTACCATrGTCGTC ' , 305-324 

CCTTCACCATATG AGGTCAAGTTATe 1 87-21 2 

GACTTAACTTAATACCG - 143-159 
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Sac7a 
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Sac7a 
Sac7b 
Sac7d 
Sac7e 
Sso7d 



Val-Lys-Val-Lys*-Phe-Lys*-iyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 
Val-Lys-Val-Lys*-Phe-Lys*-Tyr-Lys-Gly-61u-Glu-Lys-Glu-Val-Asp- 
Val-- Lvs-Val -Lys* -Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val'Asp 



Ala lLvs- Val tArg t -Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp~ 
Ala-Th^^Val-Lys* -Phe-Lys* -Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp~ 
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20 



25 



30 



Thr-Ser-Lys-Xle-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-LysrLys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr- Ser-Lvs-Ile-Lvs-Lvs-Val-Trp~Arg-Val-Gly-Lys-Met ~Val'- Ser- 
lle ^Ser-Lys-Ile-Lys-Lys-Val-Trp-'Arg-Val-Gly-Lys-Meti lle fSer- 



31 



35 



40 



45 



Phe-Thr-iyr-Asp-Asp-Asn-Gly 
Phe-Thr-Tyr-Asp-Asp-Asn-Gly 
Phe-Thr-Tyr-Asp-Asp-Asn-Gly- 
Phe~Thr-Tyr-Asp -Asp-Asn- Glv 



Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Tte-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val~Ser- 
Lys-Thr-Gly~Arg-Gly-Ala-Val-Ser- 



riAC- Aiix-iyx-^atJ-ngL' ngii J^-^l "Jr^ ^-^jr ^-^j 

Phe-Thr-Tyr-Asp^ •GlJu-Gly ^Gly^ Gly 1Lys-Thr-Gly-Arg-Gly-Ala-Val■-Ser- 



46 50 55 60 

Glu-Lys-Asp-Ala-Pro-Lys-Glu-Leu-Leu-Asp-Met-Leu-Ala -Arg-Ala-I 
Gl\i-Lys-Asp-Ala-Pro-Lys~Glu-Leu~Leu-Asp-Met-Leu-Ala \ 
Glu-Lvs-Asp-Ala-Pro-Lvs~Glu-Leu -Leu- Asp-Met-Leu~Ala-Arg-Aia- 
Glu-Lvs-Asp-Ala-Pro-Lvs-Glu-Leu ^Met-jAsp- Met-Leu -Ala-jU-g-Ala- 
Glu-Lys~Asp-Ala-Pro-Lys-Glu-Leu^Leu-| Gln tMet-Leu| ~ ' ~ 
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Figure 1: Amino acid sequences of the Sac7a, b, d, and e proteins [after Kimura et al. (1984) and Choli el al. (1988b)] and the Sso7d 
protein lafler ChoH et al. (1988a)]. [Note that the sequence reported by Kimura et al, (1984) was claimed to be for Sso7d but was later 
shown to be for Sac7d (Choli et al., 1988a).] Numbering is according to Ae Sac7d sequence without Ae initiator methionine. Regions 
homologous to Ae Sac7d protein are outlined. Sac7a, b, and d differ only in lengA. Lysines which are monomethylated to some extent in 
the native protein are indicated with asterisks. The additional C-terminal lysine coded by Ae sac7d gene described here which was not 
indicated in Ae published protein sequence is enclosed in parenAeses. ^ . 

Gly43 to Ala59. Only Ae (Zhou-Fasman algorithm predicts 
a smaD amount of ^-sheel (12%) extending from Lys22 to/v 
Lys29 and from Ser31 to Asp36. Reverse turns are predicl&cl 
near Asp36 and Gly43. These predictions are not consistent 
wiA Ae solution structure of Ae Sac7d.protein which has 
been determined by 2D NMR (Edmondsoii, Qiu, and Shriver, 
manuscript submitted). 

Recombinani Gene Expression, The sac7d gene (in 
pBluescript KS-h/sacJd) was modified by converting Ae 
hexanucleotide sequence containing Ae initiation codon 
(AATATG) to an Ndel site (CATATG) by oligonucleotide 
G (Table 1) directed mutagenesis to produce pBluescript 
KS+/5flc7£/(Nd). The Ndel-BamHi fragment of pBluescript 
KS+Aac7J(Nd) carrying Ae coding region of sac7d gene 
was Aen subcloned into Ae Ndel—BamlU site of pET-3b 
(Studier et al., 1990) to give pET-3b/5flc7ff,.and transformed 
into HMS174 (DE3), HMS174 (DE3) pLysS, BL21 (DE3), 
and BL21 (DE3) pLysS (Studier et al., 1990). The plasmid 
could be established in all of Aese strains except BL21 
(DE3). FurAenmore, in transformed BL21 (DE3) pLysS, ^ 
the growA of Ae organism is impaired and cultures lyse' 
within 60^70 min after induction wiA IPTG. On Ae oAer 
/ hand, Ae grpwA of H1V1S174 strains were not significandy 
' effected by Ae presence of Ae plasmid, and lysis^was not 
/ observed in cultures after 3 h postinducdon. The absence 
of impaired growA .in Ae presence of Ae plasmid in Aese 




Figure 2: Schagger and von Jagow (1987) polyacrylamide 
ixmrcducing SDS gel of purified native Sac7 proteins (lane 1), 
recombinani Sac7d Oane 2), and native Sso7 (lane 3) proteins 
stained with Coomasie Brilliant Blue G-250 (Bio-Rad). The 
molecular weight of the Sso7 protein is 7019 based oh Ae pubUshed 
protein sequence (Crholi et al., 1988a), while Aat of Ae Sac7d is 
7608 based on the DNA sequence presented here. The band 
posiuons of myoglobin (MW 16 900) and insulin (KfW 5780) are 
indicated for comparison. . . . # 

(including initiator methionines). . Secondary structure analy- 
sis of the sequences of Ae Sac7d and Sac7e proteins was 
performed wiA bo A Ae CThou— Fasman {Chou & Fasman, 
1974, 1978) and Ae Robson-Gamier algorithms (Robson 
A: Suzuki, 1976; Gamier et al., 1978). BoA meAods predict 
Ac occurrence of significant a-helix (52%) in boA protems 
extending from approximately Lys9 to Lys28 and from 
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GTTCTATAGCGrrAATTATCaUCAGTTCnATAAC^^ 
CTTAGACGACAAACCTGTAAATACn^TWriW^^ , 
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strand 
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strand 
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101 



151 



201 



251 



301 
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401 



451 
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551 



mTATrraJVTATTACTAATTATTGTACTGGATTC^^ r. T - 

ATCGTCGTACTCCTCAGATAAATTTCACAAAAGTrAGGGCTATT^^ • 

ACATTATATAGGAAAAATAATTTGAOGTAGTCTCATAAGTATGACTTAAC ' ' 
TAAATTGTAATGlt»TACTAATGATATITGGATATTAAT(n*AATACTGGT ' 

(A-box) ■ " '* ~' ■ -r--- 7.:— 

ITAATACCGTAAGCSlEairtATCACAATATCGTAAGATAAClI^^ -'^ 
ATATTAATGATAATAIIMn?^TOGCGAA-m - ' • 

M- Y K V Z F K Y \ 6 B K E V D . r 
ATATXXSTSAAfiGT^GTTCAAGTATAAGGGTGAAGAGAAAGAMTTAGAC . . ; 
ATATCGC^AAAGTOVfiGTTEAAGTATAAGGGTGAAGAGAAAGAAG^ ' 

MAKV BFKYKG EE K EVD 

TSKIKKVWRVGKMVSFT. 
ACTICAAAGATAAAGAAGGTXrcGAGAGTAOGCAAAATO^^ 
ACTTCAAAGATAAAGAAGGTCTGGAGAGTTGGCAAAATQGTGTCCT^ 

T S K I K K V W R V G K M V S F.T; 

• •; ' < ' ' ■■^ H^^<rsr } 

YDD N GKTGRGAVS E^K D. | ' ; 
CTATCACGACAATOGTAAGACAOGTAGAGGAGCTX3TAAGCGACAAAGATC3 • • ; : 

CTATGACGACAATGGTAAGACAGGTAGAGGAGCTGTAAGCGA&AAAGACG 

YDDNG KTGR G A V,S-E K D 

— is'^. i ' 



E 



B £ K K 



CTCCAAAAGAACTA&' 
A P K E L U 



'AGAC^iSpTAOG^J^^ 



DMLARAEEEK Stop 



stop •• 
TAAAATAATTKriTAAGAAAATCTTCATATAAA TICTTTT^ 

GGGGAAGGTATAAAGGQCTTTlTAAATGTCaAAAGTrnTA^Tnil^^ 

•mTAATTTATTAGAATTC . . . *. • - ^ 

GCATTTCAACTTTAGAAGATCTTTTATAATAGCCTAAATTTCrr^^ 



GGAGTTrTTCCGCTATTCTTAGGCTKX^TAATAATAAT^ 



AGTATT ^ . 

Figure 3: Nucleotide sequences of the sacld and sacJe genes. 
The top and bonom sequences are the nucleotide sequence for the 
sacld and sac7e genes, respectively (aligned using the coding region 
of each gene). Numbering starts with the sac7e sequence. The amino 
acid sequence coded for by each gene is shown above {sac7d) or 
below (sacle) each nucleotide sequence. Putative promoter (A- and 
6-boxes) and termination elements are underlined in the 5' and 3' 
honcoding regions of each sequence. Amino acid and nucleotide 
differences in the coding region of each gene are also indicated by 
underlines. The G at the start of transcription (in the B-box) for 
each gene is indicated with an asterisk. . . 

strains was correlated with a lack of Sac7d protein^ ac- 
cumulation. In contrast to HMS174 strains, BL21 and its 
derivatives lack the ompT outer membrane protease and are 
deficient in the tow A protease (Studier et al., 1990): The 
ompT protease has been shown to be responsible for 77 RNA 
polymerase degradation during protein purification from £. 
coli (Grodberg & Dunn. 1988). Thus, it appears that in the - 
absence of stringent regulation of T7 RNA polymerase 
synthesis prior to induction with IPTG, or proteolytic 
degradation of the Sac7d protein, the protein accumulates 
to lethal levels. However, because significant amoimts of 
the Sac7d protein do not accumulate in HMS174 strains, we 
have utilized BL21 (DE3) pLysS for subsequent expression 
and purification of the protein. 

Spectroscopic and Chemical Characterization, The UV 
spectra of native and recombinant Sac7 jjroteins were 
essentially identical, as expected, given the presence of a 
single tryptophan and two tyrosines and two phenylalanines 
in adl proteins. The calculated extinction coefficient based 
on amino acid composition is 1.05 mL/(mgxm) at 280 ran, 
in good agreement with the value of 1.03 wUim^cm) 
determined by ninhydrin analysis. The extinction coef- 
ficients were also determined by using the ratio of absorbance 
at. 280 and 205 nm (see Materials-^d Methods). :Thc 




Figure 4: £>etennination of the in vivo start of transcription i 
the sac7d and sac7e genes by primer extension analysis. sac7d Oz 
d) and sac7e Qane c) specific oligonucleotides D and E. respectiv* 
[which are complementary to residues 398-418 (Figure 3)], w. 
used to prime the synthesis of a complementary strand of Ut 
from total S, acidocaldarius RNA. These same oligonucleotit 
were also primers in the dideoxy sequencing reactions used 
markers for the sac7d (pBSKS'¥lsac7d) and sac7e genes (pUC 
sdc7€) indicated. The sequences written on the left and right 
complementary to the ones observed in the autoradiogram in 
marked region. The start of transcription is indicated in & 
sequence by an asterisk. The first five coded amino acids of e 
protein are also indicated along side each complementary str 
sequence. . • . ...^x^ 

empirical nature of this method might lead to some quest 
of its accuracy, but the high correlation of the results fr 
the six standards is extraordinary (r = 0.999), and 
reproducibility of the Aiso/A^os ratio measurement is h 
leading to an expected error of 0.6%. The ratio metl 
demonstrates that the extinction coefficients of the nat 
and recombinant protein are identical, viz., the mean of 
extinction coefficient measurements (native and recombir 
combined) using this method was 1.18 mL/(mg^:'m) wii 
standard deviation of 0.(X)8 niL/(nigxmi). The fmal exti 
tion coefficient for both the recombin^t and native prot 
is taken to be 1.09 mL/(mgx:m). the mean of the tl 
independent measurements, with a standard error of ±^ 
(calculated by propagating the errors of the three mcas 
ments). The extinction coefficient was ishown to he 
independent from 2 to 10._ .. .j- . - -* 

The fluorescence excitation and einission spectra of 
native Sac7 and recombinant Sac7d proteins were 
essentially - identical (data not :shown). in addition, 
fluorescence emission spectrum was essentially that expe 
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Figure 5: Southem analysis of Sulfolohus genomic DNA. (A) Autoradiogram of a Southern blot of //i/idDI digests of genomic DNA from 
S, acidocaldarius (RGJM) (lane I), S. acidocaldarius (DG6) flane 2), S, acidocaldarius (DSM639) Oane 3), S. solfataricus (DSM5354) 
(lane 4), and S. solfataricus (P2) (lane 5) probed with oligonucleotide F. The approximate sizes of the restriction fragments hybridizing to 
oligonucleotide F are indicated. (B) Autoradiogram of a Southem blot of £coRl Oane E), HindJM Oane H), and Pst\ (lane P) digested S. 
acidocaldarius RGJM genomic DNA hybridized with oligonucleotide F. Two closely spaced bands in lane P are clearly evident in the 
original autoradiogram. Lane E'*' is a second independent EcoYd experiment to clearly demonstrate the 0.8 kb fragment. (C) Similar to 
panel B except that the DNA was probed with oligonucleotide H. :*;.*• — 



e 




1 \ 




i - 


4 

w 


C J 








• 






/ 




0 


- * 7 









160 



Wavelength (nm) 



Figure 6: Circular dichroism spectra of native Sac7 (solid line, 
0.26 mg/mL) and recombinant Sac7d (dashed line, 0.66 mg/mL) 
proteins in 0.01 M KH2PO4, pH 7.0. 

for a free tryptophan, indicating that the single tryptophan 
is highly solvent exposed in both proteins. Notably, the 
fluorescence emission spectra show a small shift upon 
DNA binding (data not shown), indicating that the exposure 
of the tryptophan changes slightly upon DNA binding. The 
CD spectra of native Sac7 -and recombinant Sac7d proteins 
were also essentially identical . (Figure 6). The variable 
selection method of Johnson (Manavalan & Johnson, 1987) 
indicates that both the native and recombinant Sac7 proteins 
are composed of 31% helix (both a- and 3io-helix), 22- 
25% j?-sheet, 0-2% turn, and 42-45% nonrepetitive struc- 
ture. 

The DQF-COSY spectra of the native and recombinant 
Sac7 proteins are remarkably similar (Figure 7). The native 
spectmm shows some additional correlarion peaks, most 
Wkely due to the presence of 7a, b, c, d, and e isoforms in 
the native preparation and posttranslational modifications 
(e.g.. monomethylation of lysines) in Sulfolohus. The 
essential identity of the chemical shifts for the native and 
recombinant proteins indicates again that the recombinant 
and native proteins are folded similarly. The extensive 
number of alpha protons shifted downfield of Ihe water line 



at 4.7 ppm indicates the presence of significant )3-sheet 
structure (Wishart et al., 1992). The wide chemical shift 
dispersion has permitted an essentially complete assignment 
of the proton resonances and determination of the solution 
structure (Edmondson, Qiu, and Shriver, manuscript submit- 
ted). 

No phosphorylation or glycosylation of either the native 
or recombinant proteins could be detected. The recombinant- 
protein differs from the native by containing the initiator 
methionine. The recombinant prote'm also contains an 
additional C-terminal lysine which was not reported in the 
amino acid sequence (Kimura et al., 1984), although it 
remains to be determined if this is an error in the protein 
sequence or if the lysine is actually removed posttransla- 
tionally. ; ... . 

DNA Binding, The binding of Sac7 proteins to 
associated with a significant quenching of the intrinsic 
fluorescence of the single tryptophan (Trp23) in both the 
native and recombinant Sac7 proteins (Figure 8). Binding 
of poly[dGdC]-poly[dGdC] in O.Ol M KH2P64 at pH 7.0 
leads to a maximal fluorescence quenching of the native 
protein by 88% and the recombinant Sac7d protein by 87%. 
Poly[dAdT]'poly[dAdT] shows a maximal quenching of 84% 
for both proteins (data not shown). The binding data can 
be fit using the McGhee and von Hippel model (McGhee 
and von Hippel, 1974) without' cooperative interacdbhs 
assuming a linear relationship between fractional quenching 
and protein binding. The poly[dGdCl*poly[dGdCl data can 
be fit with an intrinsic association constant of 2 x 10^ M"' 
for both native and recombinant Sac7d protein and site sizes 
of 7 bases (3.5 base pairs) and 6.8 bases for native and 
recombinant protein, respectively. Poly[dAdT]'poly[dAdTl 
appears to bind slightly weaker with an association constant 
of 1 X 10' M"' for both proteins and site sizes of 7-5 bases 
for native protein and 6.8 bases for recombinant protein. 

The binding of Sac7 to poly[dAdTlTX)ly(dAdTl signifi- 
cantly stabilizes the DNA double helix against 'thermal 
denaturation. The UV melting cuirve of poly[dAdT]-poly- 
[dAdT] in 0.01 M KH2PO4 is very sharp and has ^ r„ of 
43.5 ^'C (Figure 9). In the presence of native.Sac7d protein, 
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Figure 7: Etouble-quantum filtered (DQF-COSY) a to amide 
proton correlation spectra of the native Sac7 (A) and recombinant 
Sac7d (B) proteins at 35 ^^C in 90% H2O/10% DjO. pH 4.1. The 
protein concentrations in both spectra were approximately 10 mM. 

the melting profile of poly[dAdT]*poly[dAdT] broadens and 
the Tm increases. At the highest protein concentration used 
in this series of experiments, the DNA melting temperature 
was increased about 33 °C above that of polytdA^Tl-poly- 
[dAdT] alone. The recombinant protein increases the 7^ of 
poly[dAdT>poly[dAdT] by a siinilar amoimt.. However, the 
recombinant protein differs in that it aggregates as the double- 
stranded poly [d( AT)] melts. CD measurements of /he 
suspension, and the supernatant after allowing the aggregate 
to settle, indicate no major conformational changes during 
aggregation of the protein-DNA mixture. .. . ■ ^ 
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Figure 8: Reverse titrations of the native Sac7 (solid circles) a 
recombinant Sac7d (open circles) proteins with poly[dGdC)T>o 
[dGdq at pH 7.0 (0.01 M KH2PO4). 25 X with 6.6 fxM Ss 
proteins and 7.3 fxM Sac7d. The smooth curves through the d 
are overlays of simulations using a noncooperaiive McGhee— v 
Hippel model (McGhee & von Hippel» 1974). For the native Sz 
proteins this corresponds to a site size of 7 bases (3.5 base paii 
maximal quenching of 88%, and an intrinsic association const; 
of 2 X 10^ M~'. For the recombinant Sac7d protein this correspor. 
to a site size of 6.8 bases (3.4 base pairs), maximal quenching 
87%r«id an association constant of 2 x 10' M~'. 
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Figure 9: Thermal denanu^tion of poly[dAdT]'poly[dAdT] moi 
tored by changes in UV absorbance at 262 nm in 0.01 M KH2K 
pH 7.0. The melting of poly[dAdT]-poly[dAdT] is shown alo 
(open triangles), with native Sac7 proteins (solid circles), and w: 
recombinant Sac7d (open circles). The concentration of pel 
[dAdT]*poly[dAdT was 70 /<M (nucleotides), and the concenirati* 
of protein was 350 ^M. ^ . - ' 

Thermal Stability, Sac7 proteins are highly thermostab! 
as expected from their origin. Native Sac7 and recombina 
Sac7d samples heated to 100 °C showed no precipitation 
cloudiness, although some increase in scattering was notic 
able in the UV spectrum. The proteins unfold reversibly 
indicated by the observation of similar endotherms wi 
repetitive DSC scans up to 100 °C. 

The native Sac7 proteins show a DSC endotherm at p 
6.0 (0.01 M KH2PO4, 0.1 M KQ, 0.001 M EDTA) with 
r„ of 99.0-100.2 ^C (data not shown). By comparison, 
native Sso7 protein has a Tm of 99.4 under simil 
conditions (data not shown). A precise midpoint for tJ 
unfolding transition is difficult to defme since data abo 
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Figure 10: Differential scanning calorimetry (DSQ of native Sac7 
(solid circles) and recombinant Sac7d (open circles) proteins at pH 
4.0 (03 M KCl, 0.05 M potassium acetate). Protein concentrations 
were 1.5 mg/nr»L of native Sac7 proteins and 1.38 mg/mL of 
recombinant Sac7d. Smooth curves through the data are nonlinear 
least-squares fits with = 80.3 ^'C, dJi^i = 53.0 kcal/mol, AWvh 
= 49.6 kcal/mol, for the recombinant protein; and Tn, = 86.8 °C, 
AWai = 56.4 kcalAnol, A//vh = 60.3 kcalAnol for the native protein. 

100 **C cannot be collected in water in the MC2 calorimeter. 
Notably, the unfolding of the native Sac7 proteins is 
remarkably reversible, as indicated by essentially 100% 
reproducibility of successive scans on the same sample 
following cooling. The recombinant Sac7d protein unfolds 
at pH 6.0 (0.01 M KH2PO4, 0.1 M KQ, 0:00I M EDTA) 
with a Tjn of 92.7 °C, or approximately 7 less than the 
native. 

A reliable analysis of the DSC endotherms requires a more 
complete delineation of the endothenn which can be obtained 
by lowering the pH and increasing the salt concentration to 
shifi the endotherms to lower temperature. At pH 4.0 (0.05 
M potassium acetate, 0.3 M KCl) the native protein unfolds 
with a of 86.8 ''C (Figure 10). The endotherm can be fit 
uith a van't Hoff enthalpy of 60.3 kcal/mol and a calori- 
mctric enthalpy of 56.4 kcal/mol, i.e., a A//cai/A//vh of 0.94, 
indicating that the native protein exists as a monomer under 
these conditions and unfolds in an all-or-none fashion with 
no significant, populated intermediates. l,-^^' . 

The recombinant Sac7d protein similarly unfolds reversibly 
ai pH 4.0 (0.05 M potassium acetate, 0.3 M KCl) but with 
a midpoint temperature of 80.3 ""C (Figure 10), or 6.5 °C 
less than the native protein. It unfolds with a van't Hoff 
enthalpy of 49.6 kcal/mol, and a calorimetric enthalpy of 
53.0 kcal/mol, i.e., a A//cai/A//vh of 1.07. The identity, within 
experimental error, of the calorimetric and van't Hoff 
enthalpies indicates that the recombinant protein also exists 
as a monomer under these conditions and unfolds via a two- 
state reaction. . . . ^ - 

DISCUSSION - " ^ : ; ? f 

We report here the cloning and sequencing of two genes 
from S. Qcidocaldarius coding for^ Sac7 proteins which 
conespond to Sac7d and Sac7e. The sac7d and sac7e ^enes 
differ at only 16 positions within the coding region (under- y 
^ined in Figure 3); three of these differences are transversions,^ 
*hilc the rest are transitions. Tht sac7d znA sac7e genes . 
W)de for 66 and 65 amino acid proteins, respectively. The 
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deduced amino id sequences are in complete agreement 
with the published sequences for both proteins (Kimura et 
al., 1984; Choli et al., 1988a) with the exception of initiator 
methionines at the amino termini and an additional lysine 
(Lys66) at the carboxy terminus of the Sac7d protein in the 
deduced sequence. The additional lysine can be explained 
either by a failure to discern the final lysine in the amino 
acid sequencing of the Sac7d or by posttranslational carboxy- 
terminal processing to produce the mature protein. It should 
be noted that Sac7d, Sac7e, and Sso7d all terminate with at 
least two lysine residues (Figure 1). 

The data presented here indicate that there are only two 
Sac7 protein genes in S. acidocaldarius. Genes coding for 
Sac7 proteins other than Sac7d and e could not be detected. 
The failure to detect genes for the Sac7a and b proteins and 
the fact that the proteins appear to be simply truncated at 
the carboxy termini to various extents suggest that Sac7a 
and b result from either posttranslational modification at. the 
carboxy terminus or by proteolysis during protein isolation 
and purification. ... ■ - 

Promoter elements consistent with the archaeal "A-box** 
and "B-box" consensus sequences have been located up- 
stream of the sac7d and sac7e protein coding sequences. The 
agreement of the "A-box" sequence of sac7d with the 
consensus "A-box" sequence is greater than that for the 
sac7e. This difference between the "A-box" of the promoter 
elements in the two genes may explain the higher levels of 
Sac7d relative to Sac7e in vivo (Grote et al., 1986). 

There is significant sequence similarity in the regions of 
sac7d and sac7e extending from the 5' end of the "A box" 
to the initiation codon when the corresponding "A-" and "B-'* 
boxes are aligned. The two sequences also have similarly 
placed pyrimidine rich regions downstream of their termina- 
tion codons. These regions show similarity to the transcrip- 
tion termination signals described for the Sulfolobus virus- 
like particle, SSVl , where transcription termination has been 
shown to occur within pyrimidine-rich regions directly 3' 
of the consensus TTTTTYT [reviewed in Brown et al.. . 
( 1 989)], Northern analysis of S. acidocaldarius RGJM RNA 
probed with an oligonucleotide (oligonucleotide F, Table 1) 
complementary to the conunon sequence at residues 305— 
324 of the two sac7 genes (Figure 3) showed hybridization 
to a single size of transcripts (Shao and Gupta, unpublished 
results), indicating that both transcripts terminate in similarly 
placed regions. Thus, it is likely that the conserved oligOr 
pyrimidine sequences of the two genes contain the transcrip- 
tion termination signals. 

Although the regions associated with transcription tenni- - 
nation are highly homologous, the sequences between diese 
regions and the termination codons are significantly different 
in the sac7d and sac7e genes. Similarly, though the regions 
encompassing the putative core promoter elements in the two 
genes ("A-" and "B-" boxes) share extensive homology, the 
sequences 5' of the "A-box" show less similarity. It would 
appear that sufficient time has elapsed since the supposed 
original gene duplication for the two sequences to diverge. 
The conservation of cis-regulatoxy . elements along with 
coding regions in the two genes indicates that there is 4 
selective pressure to maintain not only the expression of both 
gene products but also a large part of their sequence. It is 
not clear if there is more thanT>ne form of the Sso7 proteins. 

A typical ribosome binding site sequence upstream of 
initiator ATG is not observed in cither of the two sac7 genes 
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Figure 11:- Potential secondary structures for the 5'-terminal 
regions of the sac7 RNA transcripts determined using Mulfold 
(Jaeger et al., 1989a,b; Zuker, 1989). Initiator codons are shown 
in lower case. Putative ribosome binding sequences GGUGA and 
AGGU are indicated in bold and underlined fonnals, respectively. 
Note that the AGGU sequences within the two franscripts are ' 
located at different positions. • « - - 

(Figure 3). This is not unusual, since many other Sulfolobus 
genes also lack these sites (Amils et al., 1993; Dalgaard & 
Garrett, 1993). However, potential ribosome binding sites 
are observed downstream of the initiator codons of the two 
5ac7 genes which have precedents in other archaea. The 
ribosome binding sites in certain halobacterial genes, which 
have very short or no 5' untranslated regions, occur within 
loops of potential hairpin strucnjres in the 5' regions of the 
transcripts (Brown et al.. 1989; Amils et al., 1993), The 
hairpin arrangement probably exposes these sites for inter- 
action with 16S rRNA. We note that the 5' regions of the 
two sac? transcripts can be folded into secondary structures 
as shown in Figure 1 1 . The sequence UCACCU near the 3' 
end of 16S rRNA of Sulfolobus (Woese et al., 1984; Olsen 
et aL, 1985) potentially can either form five base pairs with 
GGUGA within codons 1 -3 or form four base pairs with 
ACjGU within codons 3~4 of the sac7d transcript. Corre- 
sponding sequences in the sac7e transcript are GGCAA and 
AAGU, respectively, which cannot form similar pairs with 
the 16S rRNA. However, further downstream in the sacVe 
transcript, there is AGGU within codons 5-6, which can 
form four base pairs with the same UCACCU sequence of 
the 16S rRNA; the corresponding site in sac7d is less 
efficient AAGU. Parts of these potential ribosome binding 
sites do occur within single-stranded regions (Figure 1 1), as 
are the cases for the above mentioned halobacterial genes. 
The differences between the sequences and locations of the 
potential ribosome binding sites of the two 5ac7 transcripts, 
along with the previously mentioned differences in the "A- 
box" sequences, may also explain the higher synthesis of ' 
Sac7d protein. \ 

Kimura et al. (1984) have previously noted that the 
clustering of lysines in the amino terminus of these proteins 
is reminiscent of that observed in eukaryotic HMG proteins. 
Choli et al. (1988b) have also pointed out a slight sequence 
similarity with E2A DNA-binding protein from adenovirus. 
An extensive search of the currently available sequence 
databases showed no significant homologies between the 
Sac7d protein and any known chromatin or DNA-binding 
protein. A BLAST search using the Sac7d sequence picked 
up a 100% homology with the amino-terminal sequence (only 
12 arhino-terminal residues are known) of a small protein 
(accession number S21168) from 5. solfataricus which 
apparently catalyzes disulfide bond formation (Gu^ardi 
et al., 1992). This report should be viewed with caution due 
to the loss of activity upon cation exchange chromatogrBphy 
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of th. :)tein. BLAST also picked up a high homolo; 
a reported p2 ribonuclease (Fusi et al., 1993) froi 
solfataricus with a sequence identical to the Sso7d pr 
(Choli et al., 1988a). RNase activity for the 7 kDa pro 
is suiprising and remains to be confirmed. Prelimi 
experiments indicate that the recombinant Sac7d protein 
not have RNase activity (Edmondson and Shriver, un 
lished results). The BLAST search also picked up ; 
weak homology with the 30S ribosomal protein S5 fro: 
coli (P02356) and heat shock protein X16 from the Afi 
clawed frog (A22175). A FASTA search using the S 
sequence revealed some homology with elongation f; 
\-6 (P29692), 30S ribosomal protein S8 (P24353), and D 
directed RNA polymerase subunit A' (P31813). A PRO! 
search using the Sac7d sequence revealed phosphocre; 
kinase phosphorylation sites at residues 17-19 (TSK) 
42 (TGR), and 46-48 (SEK), and creaUne kina^ 
phosphorylation sites at 33-36 (TYDD), and 46-49 (SH 
A BLOCKS analysis provided a single meaningftil m 
with ribosomal S5 protein. 

We have expressed the sac7d gene in the tightly contrc 
BL21(DE3)pLysS E. coli expression system develope* 
Studier et al. (1990) using the pET series of plasn 
Accumulation of the sacVd gene product appears to be 1< 
in £. coli. This is indicated perhaps most clearly by 
inability to establish the pET-3b/^ac7J construct 
Bi=31(DE3). The additional regulation provided by tht 
lysozyme inhibition of T7 polymerase appears to be requi 
The purified, recombinant protein can be isolated ^ 
reasonable yield, e.g., typically, about 1 mg of protein p 
of wet weight E. coli cells is obtained, or approximately n 
that obtained for the native protein from 5. acidocaldai 
We have been unsuccessful in expressing the sacle g' 
possibly due to its usage of codons rare in E. coli. 

The recombinant Sac7d protein appears to be essenti 
identical to the native Sac7 proteins in all respects ex( 
for stabiUty. The UV spectral extinction coefficients 
identical, as are the fluorescence excitation and emis: 
spectra. This is perhaps not surprising given that both 
largely due to a single tryptophan on the surface of 
protein (Edmondson, (Jiu, and Shriver. manuscript submit 
[see also Baumann et al. (1994) for the strucnjire of Sso' 
although the two tyrosines shoui'd be sensitive to differer 
in structure. CD spectra are more sensitive to differer 
in secondary structure content, and the spectra of the i 
proteins are essentially identical, again indicating sim 
structures for native and recombinant protein. 

Analyses of the CD spectra using the variable seleci 
method of Johnson (Manavalan & Johnson, 1987) indi( 
that Sac7d consists of 3 1 % helix and 22-25% )5-sheet. 1 
differs from the' 52% a-helix, 12% )S^sheet predicted 
sequence analysis algorithms in this work and the I 
a-helix, 15%)5-sheet predicted by Choli et al. (1988a) us 
the average of four different prediction methods. All of th 
methods significantly underestimate the amount of /3-sl 
in Sac7d (42%) as determined from the NMR solui 
structure (Edmondson, Qiu, and Shriver, manuscript subi 
ted) [see also Baumann et al. (4994)]. However, the hel: 
content detennined by CD (3 1 %) is close to that of the NI 
solution structure (22% a-helix, 1 1% 3i(rhelix)! An anal; 
of the CD spectrum of Sac7e (Dijk & Reinhardt, 1986) us 
the PG method (Provencher & Glockner, 1981) gave a mi 
better estimate of /5-sheet content (44%) but imderestima 
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ihc helical content (15%). The CD spectrum reported for 
Sac7e (Dijk & Reinhardt, 1986) differs quantitatively from 
thai of native Sac? and recombinant Sac7d presented here. 
Further, the inability of the CD analyses to accurately 
estimate the secondary structure content suggests that at least 
part of the secondary structure contributions to the CD 
spectra of the Sac7 proteins are not well represented in these 
sets of reference proteins. . . - . . 

A more detailed, atomic level comparison of the structures 
of the recombinant and native proteins can be obtained from 
NNiR. The "fingerprint" region of double-quantum filtered 
COSY spectra of proteins shows the chemical shift correla- 
lions of alpha and NH protons and is exquisitely sensitive 
to the structure of the protein [see, for example, Wishart et 
al. (1992)]. This permits a qualitative comparison of the 
sinjcture of the backbone of the two proteins which is more 
detailed than that provided by optical spectra comparisons. 
The fingerprint regions of native and recombinaiit Sac7d 
protein are remarkably similar, indicating that the two 
proteins have very similar backbone folding patterns. * 

The binding of the Sac7 proteins to double stranded DNA 
leads to a dramatic decrease in intrinsic tryptophan fluores- 
cence. The large signal allows for essentially noise-free 
titrations and accurate comparisons of, the native and 
itcombinant protein binding function. The data presented 
here indicate an affinity of 2 x lO"' M"' and site size of 33 
base pairs for polyIdGdC]-poly[dGdC]. The agreement of 
quantitative binding parameters obtained for the native and 
recombinant proteins is additional evidence for essentially 
identical global folds for the two proteins. These binding 
studies are the first quantitative analysis of the binding of 
the Sac7 proteins to DNA. * 

Various prior studies of the 7 kDa DNA-binding proteins 
from Sulfohbus have characterized the binding to nucleic 
acids in a qualitative marmer. Electron micrographs of the 
7 kDa proteins from 5. acidocaldarius complexed with DNA 
indicated that the helix becomes increasingly compacted with 
increasing ratios of protein to DNA (Dijk & Reinhardt, 1986; 
Lurz et al., 1986). Filter binding studies confirmed that the 
7 kDa proteins had an affinity for pBR322 DNA even at 
relatively high salt concentrations (e.g., 0.265 M NaQ) which 
u-as comparable to that observed for E. coli HU protein 
(Grote et al., 1986; Choli et al., 1988a). Characterization 
of the affinity for DNA in this work was in terms of percent 
bound at a specific ratio of protein to DNA. DNA-melting 
studies have also been performed on a small DNA-binding 
protein from S. acidocaldarius, HSNP-C, with an amino acid 
composition similar to the Sac7e protein, although the 
sequence has not been presented The protein increases the 
Tmof double-stranded DNA (Reddy & Suryanarayana, 1989). 
In addition, this protein demonstrated a significant quenching 
of its intrinsic tryptophan fluorescence upon DNA binding, 
although no quantitative analysis of the titrations was 
performed. r . 

Baumann et al, (1994) have recently presented sofne 
fluorescence binding data for the homologous Sso7 proteins 
from S, solfataricus, A quantitative analysis of the titrations 
*'as not performed, but a visual inspection of the data 
indicates a binding site size for double-stranded DNA of six 
^ pairs in low salt (0.02 M Tris, pH 7.4), nearly twice 
thai presented here. Assuming a site size of 3—6 base pairs, 
the binding affinity in low salt is approximately 0.5 to 1 x 
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10^ M"'. The thermal stability of poly[dIdC]TK)ly[dldC] was 
increased by approximately 40 °C in 5 mM Tris (pH 7.0). 

The unfolding of both the native and recombinant proteins 
is reversible, allowing for detailed, accurate characterization 
of the thermodynamics of folding. In contrast to all other 
physical parameters studied here, the energetics of folding 
of the recombinant Sac7d protein differs significantly from 
that of the native Sac7 proteins. The native protein unfolds 
at pH 6.0 at 1(X) °C, remarkable given the absence of any 
metal cofactors or disulfides. Surprisingly, the recombinant 
protein unfolds with a 6.5 **C less than the native. The 
lower enthalpy of unfolding of the recombinant protein is 
not surprising and most likely results from a positive heat 
capacity change associated with unfolding. Any shift to 
lower temperature of an endotherm associated with a positive 
ACp will lead to a decrease in enthalpy since 



It is generally thought that a positive AC^ of unfolding is 
due to the exposure of internal hydrophobic residues (Stur- 
tevant, 1977; Privalov & GilK 1988). The magnitude of the , 
change observed here is consistent with that observed for 
other globular proteins (Privalov & Gill, 1988). 

Maras et al. (1992) have previously noted that specific 
lysine monomethylation of glutamate dehydrogenase from 
S. solfataricus might be responsible for enhanced thermal 
stability of this enzyme relative to homologous mesophile 
forms, Baumaim et al. (1994) have presented mass spec- 
troscopic evidence correlating methylation of the Sso7 - 
protein with growth temperamre, and they have suggested 
that such a modification might be related to the stability of 
the protein. The most straightforward way to determine if 
methylation increases the thermostability of the protein would 
be to compare the stabilities of the protein in its methylated 
and unmethylated forms." Demethylation of the native protein 
is not a trivial control experiment given the lack of 
commercially available demethylases and most importantly/'' 
the specificity of reported demethylases (Paik & Kim, 1980)^ 
In the absence of a demethylase, the preparation of an 
unmethylated form is best accomplished iising recombinant 
protein. We have demonstrated here a significant difference 
in. the thermostability of native and recombinant Sac7 protein. 
The only known difference between these proteins is the 
6-aminomonomethylation of lysines 5 and 7 in the native 
protein and the initiating methionine in the recombinant 
protein. The lack of Lys66 in the reported amino acid 
sequence of the native proteiii is presumably a sequencing 
error, and this will be investigated in the NMR analysis of 
the native protein. No other posttranslational mcklification, 
such as phosphorylation or glycosylation, of the native or 
recombinant Sac7 proteins was detectable. The current 
evidence, therefore, strongly indicates that Sulfohbus can 
increase the thermostability of some of its proteins by specific 
lysine monomethylation. 

We note that the level of specific methylation of Sac7 is 
variable and incomplete, i.e., the native preparation is 
heterogeneous (Kimura et al., 1984; Choli et al., 1988a,b). 
Choli et al., < 1988b) report that the degree of monomethyl- 
ation of lysine 4 is 70%, 25%, and 20% in native Sac7a, 
Sac7b, and Sac7d, respectively; while that for lysine 6 is 
50%, 40%, and 50%, respectively. Heterogeneity would be 
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expected to lead to broaaening of the endotherm, rather than 
narrowing (see Figure 10). It would appear, therefore, that 
stabilization might not require complete methylation of the 
specific lysines. • .. . ' . . - - - . , 

Lilercstingly, we have been unable to increase the stability 
of the recombinant Sac7d protein by nonspecific, reductive 
methylation (McCrary and Shriver, unpublished results), a 
process which leads to predominantly dimethylation (Means 
& jFeeney, 1971). Monomethylalion changes the pAT, of the 
€-amino group from 9.25 to 10.63. while dimethylation has 
little further effect giving a p^. of 10.78 (Paik & Kim, 1980). 
Trimethylation returns the p^, to 9.8. Given the small 
change in pAT. and the fact the difference is observed even 
at pH 4.0, it is doubtful that an effect of monomethylation - 
on stability might be electrostatic in origin. A structural 
explanation of the difference in stability must await a more 
detailed comparison of the structures of the native and 
recombinant proteins. The spectroscopic data presented here 
would indicate that the structural differences are slight. 
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